... script data load.
On related note an attempt was made to merge rotations with logical
operations. I mean as we know, ARM ISA has merged rotate-n-logical
instructions which can be used here. And they were used to improve
keccak1600-armv4 performance. But not here. Even though this approach
resulted in improvement on Cortex-A53 proportional to reduction of
amount of instructions, ~8%, it didn't exactly worked out on
non-Cortex cores. Presumably because they break merged instructions
to separate μ-ops, which results in higher *operations* count. X-Gene
and Denver went ~20% slower and Apple A7 - 40%. The optimization was
therefore dismissed.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reduce per-round instruction count in Thumb-2 case by 16%. This is
achieved by folding ldr/str pairs to their double-word counterparts.
Reviewed-by: Rich Salz <rsalz@openssl.org>
This is achieved mostly by ~10% reduction of amount of instructions
per round thanks to a) switch to KECCAK_2X variant; b) merge of
almost 1/2 rotations with logical instructions. Performance is
improved on all observed processors except on Cortex-A15. This is
because it's capable of exploiting more parallelism and can execute
original code for same amount of time.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/4057)
"More" refers to the fact that we make active BIT_INTERLEAVE choice
in some specific cases. Update commentary correspondingly.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Improvement is result of combination of data layout ideas from
Keccak Code Package and initial version of this module.
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Rich Salz <rsalz@openssl.org>
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.
AVX-512 results cover even Skylake-X :-)
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
New register usage pattern allows to achieve sligtly better
performance. Not as much as I hoped for. Performance is believed
to be limited by irreconcilable write-back conflicts, rather than
lack of computational resources or data dependencies.
Reviewed-by: Rich Salz <rsalz@openssl.org>
This gives much more freedom to rearrange instructions. This is
unoptimized version, provided for reference. Basically you need
to compare it to initial 29724d0e15
to figure out the key difference.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Curiously enough out-of-order Silvermont benefited most from
optimization, 33%. [Originally mentioned "anomaly" turned to be
misreported frequency scaling problem. Correct results were
collected under older kernel.]
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/3739)
The assembler already knows the actual path to the generated file and,
in other perlasm architectures, is left to manage debug symbols itself.
Notably, in OpenSSL 1.1.x's new build system, which allows a separate
build directory, converting .pl to .s as the scripts currently do result
in the wrong paths.
This also avoids inconsistencies from some of the files using $0 and
some passing in the filename.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3431)
Not exactly everywhere, but in those source files where stdint.h is
included conditionally, or where it will be eventually
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3447)
Fix some comments too
[skip ci]
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3069)
This removes the fips configure option. This option is broken as the
required FIPS code is not available.
FIPS_mode() and FIPS_mode_set() are retained for compatibility, but
FIPS_mode() always returns 0, and FIPS_mode_set() can only be used to
turn FIPS mode off.
Reviewed-by: Stephen Henson <steve@openssl.org>
- harmonize handlers with guidelines and themselves;
- fix some bugs in handlers;
- add missing handlers in chacha and ecp_nistz256 modules;
Reviewed-by: Rich Salz <rsalz@openssl.org>
In non-__KERNEL__ context 32-bit-style __ARMEB__/__ARMEL__ macros were
set in arm_arch.h, which is shared between 32- and 64-bit builds. Since
it's not included in __KERNEL__ case, we have to adhere to official
64-bit pre-defines, __AARCH64EB__/__AARCH64EL__.
[If we are to share more code, it would need similar adjustment.]
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
Now that we can link specifically with static libraries, the immediate
need to split ppccap.c (and eventually other *cap.c files) is no more.
This reverts commit e3fb4d3d52.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Having that code in one central object file turned out to cause
trouble when building test/modes_internal_test.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/1883)
The prevailing style seems to not have trailing whitespace, but a few
lines do. This is mostly in the perlasm files, but a few C files got
them after the reformat. This is the result of:
find . -name '*.pl' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
find . -name '*.c' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
find . -name '*.h' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
Then bn_prime.h was excluded since this is a generated file.
Note mkerr.pl has some changes in a heredoc for some help output, but
other lines there lack trailing whitespace too.
Reviewed-by: Kurt Roeckx <kurt@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
MIPS[32|64]R6 is binary and source incompatible with previous MIPS ISA
specifications. Fortunately it's still possible to resolve differences
in source code with standard pre-processor and switching to trap-free
version of addition and subtraction instructions.
Reviewed-by: Richard Levitte <levitte@openssl.org>
IBM argues that in certain scenarios capability query is really
expensive. At the same time it's asserted that query results can
be safely cached, because disabling CPACF is incompatible with
reboot-free operation.
Reviewed-by: Tim Hudson <tjh@openssl.org>
The Unix build was the last to retain the classic build scheme. The
new unified scheme has matured enough, even though some details may
need polishing.
Reviewed-by: Rich Salz <rsalz@openssl.org>
As it turns out branch hints grew as kind of a misconception. In
addition their interpretation by GNU assembler is affected by
assembler flags and can end up with opposite meaning on different
processors. As we have to loose quite a lot on misinterprerations,
especially on newer processors, we just omit them altogether.
Reviewed-by: Tim Hudson <tjh@openssl.org>
The reason to do so is that some of the generators detect PIC flags
like -fPIC and -KPIC, and those are normally delivered in LD_CFLAGS.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Some of these scripts would recognise an output parameter if it looks
like a file path. That works both in both the classic and new build
schemes. Some fo these scripts would only recognise it if it's a
basename (i.e. no directory component). Those need to be corrected,
as the output parameter in the new build scheme is more likely to
contain a directory component than not.
Reviewed-by: Andy Polyakov <appro@openssl.org>
Before the 'Introduce the "pic" / "no-pic" config option' commit, the
shared_cflag value for the chosen config would be part of the make
variable CFLAG, which got replicated into CFLAGS and ASFLAGS.
Since said commit, the shared_cflag value has become a make variable
of its own, SHARED_CFLAG (which is left empty in a "no-pic" build).
However, ASFLAGS was forgotten. That's what's corrected with this
change.
Reviewed-by: Andy Polyakov <appro@openssl.org>
This gets rid of the BEGINRAW..ENDRAW sections in crypto/sha/build.info.
This also moves the assembler generating perl scripts to take the
output file name as last command line argument, where necessary.
Reviewed-by: Andy Polyakov <appro@openssl.org>
Make all scripts produce .S, make interpretation of $(CFLAGS)
pre-processor's responsibility, start accepting $(PERLASM_SCHEME).
[$(PERLASM_SCHEME) is redundant in this case, because there are
no deviataions between Solaris and Linux assemblers. This is
purely to unify .pl->.S handling across all targets.]
Reviewed-by: Richard Levitte <levitte@openssl.org>
Most of the assembly uses constants from arm_arch.h, but a few references to
ARMV7_NEON don't. Consistently use the macros everywhere.
Signed-off-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
The HOST_c2l() macro assigns the value to the specified variable, but also
evaluates to the same value. Which we ignore, triggering a warning.
To fix this, just cast it to void like we did in commit 08e553644
("Fix some clang warnings.") for a bunch of other instances.
Signed-off-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Andy Polyakov <appro@openssl.org>
This takes us away from the idea that we know exactly how our static
libraries are going to get used. Instead, we make them available to
build shareable things with, be it other shared libraries or DSOs.
On the other hand, we also have greater control of when the shared
library cflags. They will never be used with object files meant got
binaries, such as apps/openssl or test/test*.
With unified, we take this a bit further and prepare for having to
deal with extra cflags specifically to be used with DSOs (dynamic
engines), libraries and binaries (applications).
Reviewed-by: Rich Salz <rsalz@openssl.org>
All those flags existed because we had all the dependencies versioned
in the repository, and wanted to have it be consistent, no matter what
the local configuration was. Now that the dependencies are gone from
the versioned Makefile.ins, it makes much more sense to use the exact
same flags as when compiling the object files.
Reviewed-by: Rich Salz <rsalz@openssl.org>
It seems that on some platforms, the perlasm scripts call the C
compiler for certain checks. These scripts need the environment
variable CC to have the C compiler command.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Now that we have the foundation for the "unified" build scheme in
place, we add build.info files. They have been generated from the
Makefiles in the same directories. Things that are platform specific
will appear in later commits.
Reviewed-by: Andy Polyakov <appro@openssl.org>
This was done by the following
find . -name '*.[ch]' | /tmp/pl
where /tmp/pl is the following three-line script:
print unless $. == 1 && m@/\* .*\.[ch] \*/@;
close ARGV if eof; # Close file to reset $.
And then some hand-editing of other files.
Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
Remove lint, tags, dclean, tests.
This is prep for a new makedepend scheme.
This is temporary pending unified makefile, and might help it.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Create Makefile's from Makefile.in
Rename Makefile.org to Makefile.in
Rename Makefiles to Makefile.in
Address review feedback from Viktor and Richard
Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>