Reduce per-round instruction count in Thumb-2 case by 16%. This is
achieved by folding ldr/str pairs to their double-word counterparts.
Reviewed-by: Rich Salz <rsalz@openssl.org>
This is achieved mostly by ~10% reduction of amount of instructions
per round thanks to a) switch to KECCAK_2X variant; b) merge of
almost 1/2 rotations with logical instructions. Performance is
improved on all observed processors except on Cortex-A15. This is
because it's capable of exploiting more parallelism and can execute
original code for same amount of time.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/4057)
"More" refers to the fact that we make active BIT_INTERLEAVE choice
in some specific cases. Update commentary correspondingly.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Improvement is result of combination of data layout ideas from
Keccak Code Package and initial version of this module.
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Rich Salz <rsalz@openssl.org>
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.
AVX-512 results cover even Skylake-X :-)
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
New register usage pattern allows to achieve sligtly better
performance. Not as much as I hoped for. Performance is believed
to be limited by irreconcilable write-back conflicts, rather than
lack of computational resources or data dependencies.
Reviewed-by: Rich Salz <rsalz@openssl.org>
This gives much more freedom to rearrange instructions. This is
unoptimized version, provided for reference. Basically you need
to compare it to initial 29724d0e15
to figure out the key difference.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Curiously enough out-of-order Silvermont benefited most from
optimization, 33%. [Originally mentioned "anomaly" turned to be
misreported frequency scaling problem. Correct results were
collected under older kernel.]
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/3739)
The assembler already knows the actual path to the generated file and,
in other perlasm architectures, is left to manage debug symbols itself.
Notably, in OpenSSL 1.1.x's new build system, which allows a separate
build directory, converting .pl to .s as the scripts currently do result
in the wrong paths.
This also avoids inconsistencies from some of the files using $0 and
some passing in the filename.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3431)
Not exactly everywhere, but in those source files where stdint.h is
included conditionally, or where it will be eventually
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3447)
Fix some comments too
[skip ci]
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3069)
This removes the fips configure option. This option is broken as the
required FIPS code is not available.
FIPS_mode() and FIPS_mode_set() are retained for compatibility, but
FIPS_mode() always returns 0, and FIPS_mode_set() can only be used to
turn FIPS mode off.
Reviewed-by: Stephen Henson <steve@openssl.org>
- harmonize handlers with guidelines and themselves;
- fix some bugs in handlers;
- add missing handlers in chacha and ecp_nistz256 modules;
Reviewed-by: Rich Salz <rsalz@openssl.org>