Commit graph

25 commits

Author SHA1 Message Date
Josh Soref
46f4e1bec5 Many spelling fixes/typo's corrected.
Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
2017-11-11 19:03:10 -05:00
Andy Polyakov
64d92d7498 x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.

AVX-512 results cover even Skylake-X :-)

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-21 14:07:32 +02:00
Andy Polyakov
54f8f9a1ed x86_64 assembly pack: fill some blanks in Ryzen results.
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
2017-07-03 18:17:00 +02:00
Andy Polyakov
0a5d1a38f2 poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_8x.
As hinted by its name new subroutine processes 8 input blocks in
parallel by loading data to 512-bit registers. It still needs more
work, as it needs to handle some specific input lengths better.
In this sense it's yet another intermediate step...

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-03-22 10:59:59 +01:00
Andy Polyakov
6cbfd94d08 x86_64 assembly pack: add some Ryzen performance results.
Reviewed-by: Tim Hudson <tjh@openssl.org>
2017-03-22 10:58:01 +01:00
Andy Polyakov
c2b935904a poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_4x.
As hinted by its name new subroutine processes 4 input blocks in
parallel. It still operates on 256-bit registers and is just
another step toward full-blown AVX512IFMA procedure.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-03-13 18:48:34 +01:00
Andy Polyakov
e052083cc7 poly1305/asm/poly1305-x86_64.pl: minor AVX512 optimization.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-26 21:27:54 +01:00
Andy Polyakov
1c47e8836f poly1305/asm/poly1305-x86_64.pl: add CFI annotations.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-26 21:26:07 +01:00
Andy Polyakov
fd910ef959 poly1305/asm/poly1305-x86_64.pl: add VPMADD52 code path.
This is initial and minimal single-block implementation.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-25 18:36:41 +01:00
Andy Polyakov
73e8a5c826 poly1305/asm/poly1305-x86_64.pl: switch to vpermdd in table expansion.
Effectively it's minor size optimization, 5-6% per affected subroutine.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-25 18:36:37 +01:00
Andy Polyakov
c1e1fc500d poly1305/asm/poly1305-x86_64.pl: optimize AVX512 code path.
On pre-Skylake best optimization strategy was balancing port-specific
instructions, while on Skylake minimizing the sheer amount appears
more sensible.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-25 18:35:45 +01:00
Andy Polyakov
1ea01427c5 poly1305/asm/poly1305-x86_64.pl: allow nasm to assemble AVX512 code.
chacha/asm/chacha-x86_64.pl: refine nasm version detection logic.

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-12-15 17:57:50 +01:00
Andy Polyakov
abb8c44fba x86_64 assembly pack: add AVX512 ChaCha20 and Poly1305 code paths.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2016-12-12 10:58:04 +01:00
Andy Polyakov
ace05265d2 x86_64 assembly pack: add Goldmont performance results.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-10-24 13:01:13 +02:00
Andy Polyakov
cfe1d9929e x86_64 assembly pack: tolerate spaces in source directory name.
[as it is now quoting $output is not required, but done just in case]

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-29 14:12:51 +02:00
Rich Salz
6aa36e8e5a Add OpenSSL copyright to .pl files
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-21 08:23:39 -04:00
Andy Polyakov
3992e8c023 poly1305/asm/poly1305-x86_64.pl: contain symbols within shared lib.
We don't need it, but external users might find it handy.

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-06 09:48:15 +02:00
Andy Polyakov
284116575d poly1305/asm/poly1305-x86_64.pl: make it cross-compile.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-06 09:46:39 +02:00
Andy Polyakov
6ca3e6e779 poly1305/asm/poly1305-x86_64.pl: not all assemblers manage << in constants.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-04-20 09:51:27 +02:00
Andy Polyakov
4b8736a22e crypto/poly1305: don't break carry chains.
RT#4483

[poly1305-armv4.pl: remove redundant #ifdef __thumb2__]
[poly1305-ppc*.pl: presumably more accurate benchmark results]

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-04-04 16:56:20 +02:00
Andy Polyakov
2460c7f133 poly1305/asm/poly1305-x86_64.pl: make it work with linux-x32.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-03-15 23:58:31 +01:00
Andy Polyakov
1ea8ae5090 poly1305/asm/poly1305-*.pl: flip horizontal add and reduction.
Formally only 32-bit AVX2 code path needs this, but I choose to
harmonize all vector code paths.

RT#4346
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-03-02 13:11:38 +01:00
Andy Polyakov
4ef29667ab poly1305/asm/poly1305-x86_64.pl: MacOS X portability fix.
Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
2016-02-11 20:47:33 +01:00
Andy Polyakov
a85dbf115c poly1305/asm/poly1305-x86_64.pl: fix mingw64 build.
Reviewed-by: Tim Hudson <tjh@openssl.org>
2016-02-11 20:47:01 +01:00
Andy Polyakov
a98c648e40 x86[_64] assembly pack: add ChaCha20 and Poly1305 modules.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2016-02-10 10:31:14 +01:00