Biggest part, ~7%, of improvement resulted from omitting constants'
table index increment in each round. And minor part from rescheduling
instructions. Apparently POWER9 (and POWER8) manage to dispatch
instructions more efficiently if they are laid down as if they have
no latency...
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
As it turns out originally published results were skewed by "turbo"
mode. VM apparently remains oblivious to dynamic frequency scaling,
and reports that processor operates at "base" frequency at all times.
While actual frequency gets increased under load.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
Around 138 distinct errors found and fixed; thanks!
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)