openssl

Author	SHA1	Message	Date
Josh Soref	46f4e1bec5	Many spelling fixes/typo's corrected. Around 138 distinct errors found and fixed; thanks! Reviewed-by: Kurt Roeckx <kurt@roeckx.be> Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/3459)	2017-11-11 19:03:10 -05:00
Andy Polyakov	64d92d7498	x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results. "Optimize" is in quotes because it's rather a "salvage operation" for now. Idea is to identify processor capability flags that drive Knights Landing to suboptimial code paths and mask them. Two flags were identified, XSAVE and ADCX/ADOX. Former affects choice of AES-NI code path specific for Silvermont (Knights Landing is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are effectively mishandled at decode time. In both cases we are looking at ~2x improvement. AVX-512 results cover even Skylake-X :-) Hardware used for benchmarking courtesy of Atos, experiments run by Romain Dolbeau <romain.dolbeau@atos.net>. Kudos! Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-07-21 14:07:32 +02:00
Andy Polyakov	54f8f9a1ed	x86_64 assembly pack: fill some blanks in Ryzen results. Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>	2017-07-03 18:17:00 +02:00
Andy Polyakov	0a5d1a38f2	poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_8x. As hinted by its name new subroutine processes 8 input blocks in parallel by loading data to 512-bit registers. It still needs more work, as it needs to handle some specific input lengths better. In this sense it's yet another intermediate step... Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-03-22 10:59:59 +01:00
Andy Polyakov	6cbfd94d08	x86_64 assembly pack: add some Ryzen performance results. Reviewed-by: Tim Hudson <tjh@openssl.org>	2017-03-22 10:58:01 +01:00
Andy Polyakov	c2b935904a	poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_4x. As hinted by its name new subroutine processes 4 input blocks in parallel. It still operates on 256-bit registers and is just another step toward full-blown AVX512IFMA procedure. Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-03-13 18:48:34 +01:00
Andy Polyakov	e052083cc7	poly1305/asm/poly1305-x86_64.pl: minor AVX512 optimization. Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-02-26 21:27:54 +01:00
Andy Polyakov	1c47e8836f	poly1305/asm/poly1305-x86_64.pl: add CFI annotations. Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-02-26 21:26:07 +01:00
Andy Polyakov	fd910ef959	poly1305/asm/poly1305-x86_64.pl: add VPMADD52 code path. This is initial and minimal single-block implementation. Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-02-25 18:36:41 +01:00
Andy Polyakov	73e8a5c826	poly1305/asm/poly1305-x86_64.pl: switch to vpermdd in table expansion. Effectively it's minor size optimization, 5-6% per affected subroutine. Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-02-25 18:36:37 +01:00
Andy Polyakov	c1e1fc500d	poly1305/asm/poly1305-x86_64.pl: optimize AVX512 code path. On pre-Skylake best optimization strategy was balancing port-specific instructions, while on Skylake minimizing the sheer amount appears more sensible. Reviewed-by: Rich Salz <rsalz@openssl.org>	2017-02-25 18:35:45 +01:00
Andy Polyakov	1ea01427c5	poly1305/asm/poly1305-x86_64.pl: allow nasm to assemble AVX512 code. chacha/asm/chacha-x86_64.pl: refine nasm version detection logic. Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-12-15 17:57:50 +01:00
Andy Polyakov	abb8c44fba	x86_64 assembly pack: add AVX512 ChaCha20 and Poly1305 code paths. Reviewed-by: Rich Salz <rsalz@openssl.org>	2016-12-12 10:58:04 +01:00
Andy Polyakov	ace05265d2	x86_64 assembly pack: add Goldmont performance results. Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-10-24 13:01:13 +02:00
Andy Polyakov	cfe1d9929e	x86_64 assembly pack: tolerate spaces in source directory name. [as it is now quoting $output is not required, but done just in case] Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-05-29 14:12:51 +02:00
Rich Salz	6aa36e8e5a	Add OpenSSL copyright to .pl files Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-05-21 08:23:39 -04:00
Andy Polyakov	3992e8c023	poly1305/asm/poly1305-x86_64.pl: contain symbols within shared lib. We don't need it, but external users might find it handy. Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-05-06 09:48:15 +02:00
Andy Polyakov	284116575d	poly1305/asm/poly1305-x86_64.pl: make it cross-compile. Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-05-06 09:46:39 +02:00
Andy Polyakov	6ca3e6e779	poly1305/asm/poly1305-x86_64.pl: not all assemblers manage << in constants. Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-04-20 09:51:27 +02:00
Andy Polyakov	4b8736a22e	crypto/poly1305: don't break carry chains. RT#4483 [poly1305-armv4.pl: remove redundant #ifdef __thumb2__] [poly1305-ppc*.pl: presumably more accurate benchmark results] Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-04-04 16:56:20 +02:00
Andy Polyakov	2460c7f133	poly1305/asm/poly1305-x86_64.pl: make it work with linux-x32. Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-03-15 23:58:31 +01:00
Andy Polyakov	1ea8ae5090	poly1305/asm/poly1305-*.pl: flip horizontal add and reduction. Formally only 32-bit AVX2 code path needs this, but I choose to harmonize all vector code paths. RT#4346 Reviewed-by: Richard Levitte <levitte@openssl.org>	2016-03-02 13:11:38 +01:00
Andy Polyakov	4ef29667ab	poly1305/asm/poly1305-x86_64.pl: MacOS X portability fix. Reviewed-by: Viktor Dukhovni <viktor@openssl.org>	2016-02-11 20:47:33 +01:00
Andy Polyakov	a85dbf115c	poly1305/asm/poly1305-x86_64.pl: fix mingw64 build. Reviewed-by: Tim Hudson <tjh@openssl.org>	2016-02-11 20:47:01 +01:00
Andy Polyakov	a98c648e40	x86[_64] assembly pack: add ChaCha20 and Poly1305 modules. Reviewed-by: Rich Salz <rsalz@openssl.org>	2016-02-10 10:31:14 +01:00

25 commits