The 128-byte vectors are extensively used in chacha20_poly1305_tls_cipher
and dedicated code path is ~30-50% faster on most platforms.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6626)
It's kind of a "brown-bag" bug, as I did recognize the problem and
verified an ad-hoc solution, but failed to follow up with cross-checks
prior filing previous merge request.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6435)
This comes at cost of minor 2.5% regression on G4, which is reasonable
trade-off. [Further improve compliance with ABI requirements.]
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
As it turns out originally published results were skewed by "turbo"
mode. VM apparently remains oblivious to dynamic frequency scaling,
and reports that processor operates at "base" frequency at all times.
While actual frequency gets increased under load.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
32-bit vector rotate instruction was defined from beginning, it
not being used from the start must be a brain-slip...
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6363)
The make variables LIB_CFLAGS, DSO_CFLAGS and so on were used in
addition to CFLAGS and so on. This works without problem on Unix and
Windows, where options with different purposes (such as -D and -I) can
appear anywhere on the command line and get accumulated as they come.
This is not necessarely so on VMS. For example, macros must all be
collected and given through one /DEFINE, and the same goes for
inclusion directories (/INCLUDE).
So, to harmonize all platforms, we repurpose make variables starting
with LIB_, DSO_ and BIN_ to be all encompassing variables that
collects the corresponding values from CFLAGS, CPPFLAGS, DEFINES,
INCLUDES and so on together with possible config target values
specific for libraries DSOs and programs, and use them instead of the
general ones everywhere.
This will, for example, allow VMS to use the exact same generators for
generated files that go through cpp as all other platforms, something
that has been impossible to do safely before now.
Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5357)
C preprocessor flags get separated from C flags, which has the
advantage that we don't get loads of macro definitions and inclusion
directory specs when linking shared libraries, DSOs and programs.
This is a step to add support for "make variables" when configuring.
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5177)
256-bit AVX512VL was estimated to deliver ~50% improvement over AVX2
and it did live up to the expectations.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4838)
The __clang__-guarded #defines cause gas to complain if clang is passed
-fno-integrated-as. Emitting .syntax unified when those are used fixes
this. This matches the change made to ghash-armv4.pl in
6cf412c473.
Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/3694)
Only chacha_internal_test is affected, since this path is not used
from EVP.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4758)
Around 138 distinct errors found and fixed; thanks!
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.
AVX-512 results cover even Skylake-X :-)
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
The assembler already knows the actual path to the generated file and,
in other perlasm architectures, is left to manage debug symbols itself.
Notably, in OpenSSL 1.1.x's new build system, which allows a separate
build directory, converting .pl to .s as the scripts currently do result
in the wrong paths.
This also avoids inconsistencies from some of the files using $0 and
some passing in the filename.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3431)
- harmonize handlers with guidelines and themselves;
- fix some bugs in handlers;
- add missing handlers in chacha and ecp_nistz256 modules;
Reviewed-by: Rich Salz <rsalz@openssl.org>
In order to minimize dependency on assembler version a number of
post-SSE2 instructions are encoded manually. But in order to simplify
the procedure only register operands are considered. Non-register
operands are passed down to assembler. Module in question uses pshufb
with memory operands, and old [GNU] assembler can't handle it.
Fortunately in this case it's possible skip just the problematic
segment without skipping SSSE3 support altogether.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
Now that we can link specifically with static libraries, the immediate
need to split ppccap.c (and eventually other *cap.c files) is no more.
This reverts commit e3fb4d3d52.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Having that code in one central object file turned out to cause
trouble when building test/modes_internal_test.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/1883)
The prevailing style seems to not have trailing whitespace, but a few
lines do. This is mostly in the perlasm files, but a few C files got
them after the reformat. This is the result of:
find . -name '*.pl' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
find . -name '*.c' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
find . -name '*.h' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
Then bn_prime.h was excluded since this is a generated file.
Note mkerr.pl has some changes in a heredoc for some help output, but
other lines there lack trailing whitespace too.
Reviewed-by: Kurt Roeckx <kurt@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Some of the instructions used in latest additions are extension
ones. There is no real reason to limit ourselves to specific
processors, so [re-]adhere to base instruction set.
RT#4548
Reviewed-by: Rich Salz <rsalz@openssl.org>
_ctr32 in function name refers to 32-bit counter, but it was implementing
64-bit one. This didn't pose problem to EVP, but 64-bit counter was just
misleading.
RT#4512
Reviewed-by: Richard Levitte <levitte@openssl.org>
Usage of $ymm variable is a bit misleading here, it doesn't refer
to %ymm register bank, but rather to VEX instruction encoding,
which AMD XOP code path depends on.
Reviewed-by: Richard Levitte <levitte@openssl.org>
The Unix build was the last to retain the classic build scheme. The
new unified scheme has matured enough, even though some details may
need polishing.
Reviewed-by: Rich Salz <rsalz@openssl.org>
The reason to do so is that some of the generators detect PIC flags
like -fPIC and -KPIC, and those are normally delivered in LD_CFLAGS.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Some of these scripts would recognise an output parameter if it looks
like a file path. That works both in both the classic and new build
schemes. Some fo these scripts would only recognise it if it's a
basename (i.e. no directory component). Those need to be corrected,
as the output parameter in the new build scheme is more likely to
contain a directory component than not.
Reviewed-by: Andy Polyakov <appro@openssl.org>