OPENSSL_memcmp is a must in GCM decrypt and general-purpose loop takes
quite a portion of execution time for short inputs, more than GHASH for
few-byte inputs according to profiler. Special 16-byte case takes it off
top five list in profiler output.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6312)
This patch fixes two issues in the ia32 RDRAND assembly code that result in a
(possibly significant) loss of entropy.
The first, less significant, issue is that, by returning success as 0 from
OPENSSL_ia32_rdrand() and OPENSSL_ia32_rdseed(), a subtle bias was introduced.
Specifically, because the assembly routine copied the remaining number of
retries over the result when RDRAND/RDSEED returned 'successful but zero', a
bias towards values 1-8 (primarily 8) was introduced.
The second, more worrying issue was that, due to a mixup in registers, when a
buffer that was not size 0 or 1 mod 8 was passed to OPENSSL_ia32_rdrand_bytes
or OPENSSL_ia32_rdseed_bytes, the last (n mod 8) bytes were all the same value.
This issue impacts only the 64-bit variant of the assembly.
This change fixes both issues by first eliminating the only use of
OPENSSL_ia32_rdrand, replacing it with OPENSSL_ia32_rdrand_bytes, and fixes the
register mixup in OPENSSL_ia32_rdrand_bytes. It also adds a sanity test for
OPENSSL_ia32_rdrand_bytes and OPENSSL_ia32_rdseed_bytes to help catch problems
of this nature in the future.
Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5342)
It was observed that AVX512 code paths can negatively affect overall
Skylake-X system performance. But we are talking specifically about
512-bit code, while AVX512VL, 256-bit variant of AVX512F instructions,
is supposed to fly as smooth as AVX2. Which is why it remains unmasked.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4838)
Originally it was thought that it's possible to use AVX512VL+BW
instructions with XMM and YMM registers without kernel enabling
ZMM support, but it turned to be wrong assumption.
Reviewed-by: Rich Salz <rsalz@openssl.org>
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.
AVX-512 results cover even Skylake-X :-)
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
Exteneded feature flags were not pulled on AMD processors, as result
a number of extensions were effectively masked on Ryzen. Original fix
for x86_64cpuid.pl addressed this problem, but messed up processor
vendor detection. This fix moves extended feature detection past
basic feature detection where it belongs. 32-bit counterpart is
harmonized too.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
Exteneded feature flags were not pulled on AMD processors, as result a
number of extensions were effectively masked on Ryzen. It should have
been reported for Excavator since it implements AVX2 extension, but
apparently nobody noticed or cared...
Reviewed-by: Rich Salz <rsalz@openssl.org>
Add copyright to most .pl files
This does NOT cover any .pl file that has other copyright in it.
Most of those are Andy's but some are public domain.
Fix typo's in some existing files.
Reviewed-by: Richard Levitte <levitte@openssl.org>