Commit graph

535 commits

Author SHA1 Message Date
Richard Levitte
2bd3b626dd Make a few more asm modules conform: last argument is output file
Fixes #5310

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5315)
2018-03-08 19:31:41 +01:00
Andy Polyakov
b761ff4e77 sha/asm/keccak1600-armv8.pl: add hardware-assisted ARMv8.2 subroutines.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5358)
2018-02-19 14:15:31 +01:00
Richard Levitte
722c9762f2 Harmonize the make variables across all known platforms families
The make variables LIB_CFLAGS, DSO_CFLAGS and so on were used in
addition to CFLAGS and so on.  This works without problem on Unix and
Windows, where options with different purposes (such as -D and -I) can
appear anywhere on the command line and get accumulated as they come.
This is not necessarely so on VMS.  For example, macros must all be
collected and given through one /DEFINE, and the same goes for
inclusion directories (/INCLUDE).

So, to harmonize all platforms, we repurpose make variables starting
with LIB_, DSO_ and BIN_ to be all encompassing variables that
collects the corresponding values from CFLAGS, CPPFLAGS, DEFINES,
INCLUDES and so on together with possible config target values
specific for libraries DSOs and programs, and use them instead of the
general ones everywhere.

This will, for example, allow VMS to use the exact same generators for
generated files that go through cpp as all other platforms, something
that has been impossible to do safely before now.

Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5357)
2018-02-14 17:13:53 +01:00
Matt Caswell
6738bf1417 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
2018-02-13 13:59:25 +00:00
Andy Polyakov
af0fcf7b46 sha/asm/sha512-armv8.pl: add hardware-assisted SHA512 subroutine.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-02-12 14:05:05 +01:00
Richard Levitte
8c3bc594e0 Processing GNU-style "make variables" - separate CPP flags from C flags
C preprocessor flags get separated from C flags, which has the
advantage that we don't get loads of macro definitions and inclusion
directory specs when linking shared libraries, DSOs and programs.

This is a step to add support for "make variables" when configuring.

Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5177)
2018-01-28 07:26:10 +01:00
Pauli
4bed94f0c1 SHA512/224 and SHA512/256
Support added for these two digests, available only via the EVP interface.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5093)
2018-01-24 07:09:46 +10:00
Andy Polyakov
24d06e8ca0 Add sha/asm/keccak1600-avx512vl.pl.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4948)
2017-12-22 12:38:40 +01:00
Andy Polyakov
7533162322 ARMv8 assembly pack: add Qualcomm Kryo results.
[skip ci]

Reviewed-by: Tim Hudson <tjh@openssl.org>
2017-11-13 11:13:00 +01:00
Josh Soref
46f4e1bec5 Many spelling fixes/typo's corrected.
Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
2017-11-11 19:03:10 -05:00
Patrick Steuer
bc4e831ccd s390x assembly pack: extend s390x capability vector.
Extend the s390x capability vector to store the longer facility list
available from z13 onwards. The bits indicating the vector extensions
are set to zero, if the kernel does not enable the vector facility.

Also add capability bits returned by the crypto instructions' query
functions.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4542)
2017-10-30 14:31:32 +01:00
KaoruToda
26a7d938c9 Remove parentheses of return.
Since return is inconsistent, I removed unnecessary parentheses and
unified them.

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4541)
2017-10-18 16:05:06 +01:00
Patrick Steuer
af1d638730 s390x assembly pack: remove capability double-checking.
An instruction's QUERY function is executed at initialization, iff the required
MSA level is installed. Therefore, it is sufficient to check the bits returned
by the QUERY functions. The MSA level does not have to be checked at every
function call.
crypto/aes/asm/aes-s390x.pl: The AES key schedule must be computed if the
required KM or KMC function codes are not available. Formally, the availability
of a KMC function code does not imply the availability of the corresponding KM
function code.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4501)
2017-10-17 21:55:33 +02:00
Rich Salz
e3713c365c Remove email addresses from source code.
Names were not removed.
Some comments were updated.
Replace Andy's address with openssl.org

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/4516)
2017-10-13 10:06:59 -04:00
Andy Polyakov
236dd46339 sha/asm/keccak1600-armv8.pl: fix return value buglet and ...
... script data load.

On related note an attempt was made to merge rotations with logical
operations. I mean as we know, ARM ISA has merged rotate-n-logical
instructions which can be used here. And they were used to improve
keccak1600-armv4 performance. But not here. Even though this approach
resulted in improvement on Cortex-A53 proportional to reduction of
amount of instructions, ~8%, it didn't exactly worked out on
non-Cortex cores. Presumably because they break merged instructions
to separate μ-ops, which results in higher *operations* count. X-Gene
and Denver went ~20% slower and Apple A7 - 40%. The optimization was
therefore dismissed.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-09-09 19:09:36 +02:00
Rich Salz
b2db9c18b2 MSC_VER <= 1200 isn't supported; remove dead code
VisualStudio 6 and earlier aren't supported.

Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4263)
2017-08-27 11:35:39 -04:00
Andy Polyakov
e0584e96c1 sha/asm/keccak1600-armv4.pl: optimize for Thumb-2.
Reduce per-round instruction count in Thumb-2 case by 16%. This is
achieved by folding ldr/str pairs to their double-word counterparts.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-08-16 20:25:20 +02:00
Andy Polyakov
3c1a60e56f sha/asm/keccak1600-avx512.pl: fix buglet in SHA3_squeeze tail.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-08-12 12:23:31 +02:00
Andy Polyakov
d9ca12cbf6 sha/asm/keccak1600-armv4.pl: improve non-NEON performance by ~10%.
This is achieved mostly by ~10% reduction of amount of instructions
per round thanks to a) switch to KECCAK_2X variant; b) merge of
almost 1/2 rotations with logical instructions. Performance is
improved on all observed processors except on Cortex-A15. This is
because it's capable of exploiting more parallelism and can execute
original code for same amount of time.

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/4057)
2017-08-02 23:22:28 +02:00
Andy Polyakov
5d010e3f10 sha/keccak1600.c: choose more sensible default parameters.
"More" refers to the fact that we make active BIT_INTERLEAVE choice
in some specific cases. Update commentary correspondingly.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-08-01 22:42:35 +02:00
Xiaoyin Liu
bac5b39c96 Fix typo in sha1-thumb.pl
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4056)
2017-07-30 21:26:38 -04:00
Andy Polyakov
c363ce55f2 sha/keccak1600.c: build and make it work with strict warnings.
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3943)
2017-07-25 21:38:48 +02:00
Andy Polyakov
e3c79f0f19 sha/asm/keccak1600-avx512.pl: improve performance by 17%.
Improvement is result of combination of data layout ideas from
Keccak Code Package and initial version of this module.

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-24 21:23:01 +02:00
Andy Polyakov
0d7903f83f sha/asm/keccak1600-avx512.pl: absorb bug-fix and minor optimization.
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-21 14:12:14 +02:00
Andy Polyakov
64d92d7498 x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.

AVX-512 results cover even Skylake-X :-)

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-21 14:07:32 +02:00
Andy Polyakov
d212b98b36 sha/asm/keccak1600-avx2.pl: optimized remodelled version.
New register usage pattern allows to achieve sligtly better
performance. Not as much as I hoped for. Performance is believed
to be limited by irreconcilable write-back conflicts, rather than
lack of computational resources or data dependencies.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-15 23:04:38 +02:00
Andy Polyakov
91dbdc63bd sha/asm/keccak1600-avx2.pl: remodel register usage.
This gives much more freedom to rearrange instructions. This is
unoptimized version, provided for reference. Basically you need
to compare it to initial 29724d0e15
to figure out the key difference.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-15 23:04:18 +02:00
Andy Polyakov
c7c7a8e601 Optimize sha/asm/keccak1600-avx2.pl.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-10 10:16:42 +02:00
Andy Polyakov
29724d0e15 Add sha/asm/keccak1600-avx2.pl.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-10 10:16:31 +02:00
Andy Polyakov
313fa47fea Add sha/asm/keccak1600-avx512.pl.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/3861)
2017-07-07 10:04:33 +02:00
Andy Polyakov
b4f2a462b7 sha/keccak1600.c: internalize KeccakF1600 and simplify SHA3_absorb.
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
2017-07-03 18:18:10 +02:00
Andy Polyakov
edbc681d22 sha/asm/keccak1600-x86_64.pl: close gap with Keccak Code Package.
[Also typo and readability fixes. Ryzen result is added.]

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
2017-07-03 18:18:02 +02:00
Andy Polyakov
b547aba954 sha/asm/keccak1600-s390x.pl: typo and readability, minor size optimization.
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
2017-07-03 18:17:55 +02:00
Andy Polyakov
54f8f9a1ed x86_64 assembly pack: fill some blanks in Ryzen results.
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
2017-07-03 18:17:00 +02:00
Andy Polyakov
7807267bed Add sha/asm/keccak1600-s390x.pl.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2017-06-29 21:16:02 +02:00
Andy Polyakov
d6f0c94a65 sha/asm/keccak1600-x86_64.pl: add CFI directives.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2017-06-29 21:15:56 +02:00
Andy Polyakov
a1613840dd sha/asm/keccak1600-x86_64.pl: optimize by re-ordering instructions.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2017-06-29 21:15:51 +02:00
Andy Polyakov
a078d9dfa9 sha/asm/keccak1600-x86_64.pl: remove redundant moves.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2017-06-29 21:15:45 +02:00
Andy Polyakov
64aef3f53d Add sha/asm/keccak1600-x86_64.pl.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2017-06-29 21:15:09 +02:00
Andy Polyakov
a163e60d95 sha/asm/keccak1600-mmx.pl: optimize for Atom and add comparison data.
Curiously enough out-of-order Silvermont benefited most from
optimization, 33%. [Originally mentioned "anomaly" turned to be
misreported frequency scaling problem. Correct results were
collected under older kernel.]

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/3739)
2017-06-24 09:42:14 +02:00
Andy Polyakov
415248e1e1 Add sha/asm/keccak1600-mmx.pl, x86 MMX module.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/3739)
2017-06-24 09:42:08 +02:00
Andy Polyakov
b5cdec2fea sha/asm/sha512p8-ppc.pl: add POWER8 performance data.
[skip ci]

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3705)
2017-06-21 16:26:59 +02:00
Andy Polyakov
53ddf7dd05 Add Keccak-1600 modules for PPC64 and POWER8.
[skip ci]

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3705)
2017-06-21 16:24:36 +02:00
Andy Polyakov
1d23bbccd3 Add sha/asm/keccak1600-c64x.pl
[skip ci]

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/3708)
2017-06-21 15:21:47 +02:00
Andy Polyakov
5eb2dd88b3 Add sha/asm/keccak1600-armv8.pl.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-06-15 21:53:30 +02:00
Andy Polyakov
6dad1efef7 sha/asm/keccak1600-armv4.pl: switch to more efficient bit interleaving algorithm.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-06-08 20:21:31 +02:00
Andy Polyakov
13603583b3 sha/keccak1600.c: switch to more efficient bit interleaving algorithm.
[Also bypass sizeof(void *) == 8 check on some platforms.]

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-06-08 20:21:04 +02:00
Andy Polyakov
367c552790 sha/asm/keccak1600-armv4.pl: add NEON code path.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-06-06 19:54:29 +02:00
Andy Polyakov
56676f877d sha/asm/keccak1600-armv4.pl: add SHA3_absorb and SHA3_squeeze.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-06-06 19:54:24 +02:00
Andy Polyakov
5371810714 sha/asm/keccak1600-armv4.pl: optimization based on profiler feedback.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-06-06 19:54:19 +02:00