It's not clear whether this inconsistency could lead to an actual
computation error, but it involved a BIGNUM being passed around the
montgomery logic in an inconsistent state. This was found using flags
-DBN_DEBUG -DBN_DEBUG_RAND, and working backwards from this assertion
in 'ectest';
ectest: bn_mul.c:960: BN_mul: Assertion `(_bnum2->top == 0) ||
(_bnum2->d[_bnum2->top - 1] != 0)' failed
Signed-off-by: Geoff Thorpe <geoff@openssl.org>
Improve RSA sing performance by 20-30% by:
- switching from floating-point to integer conditional moves;
- daisy-chaining sqr-sqr-sqr-sqr-sqr-mul sequences;
- using MONTMUL even during powers table setup;
x86_64 platform. It targets specifically RSA1024 sign (using ideas
from http://eprint.iacr.org/2011/239) and adds more than 10% on most
platforms. Overall performance improvement relative to 1.0.0 is ~40%
in average, with best result of 54% on Westmere. Incidentally ~40%
is average improvement even for longer key lengths.
knock-on work than expected - they've been extracted into a patch
series that can be completed elsewhere, or in a different branch,
before merging back to HEAD.
timing attacks.
BN_FLG_EXP_CONSTTIME requests this algorithm, and this done by default for
RSA/DSA/DH private key computations unless
RSA_FLAG_NO_EXP_CONSTTIME/DSA_FLAG_NO_EXP_CONSTTIME/
DH_FLAG_NO_EXP_CONSTTIME is set.
Submitted by: Matthew D Wood
Reviewed by: Bodo Moeller
locally initialising their own.
NB: I've removed the "BN_clear_free()" loops for the exit-paths in some of
these functions, and that may be a major part of the performance
improvements we're seeing. The "free" part can be removed because we're
using BN_CTX. The "clear" part OTOH can be removed because BN_CTX
destruction automatically performs this task, so performing it inside
functions that may be called repeatedly is wasteful. This is currently safe
within openssl due to the fact that BN_CTX objects are never created for
longer than a single high-level operation. However, that is only because
there's currently no mechanism in openssl for thread-local storage. Beyond
that, this might be an issue for applications using the bignum API directly
and caching their own BN_CTX objects. The solution is to introduce a flag
to BN_CTX_start() that allows its variables to be automatically sanitised
on release during BN_CTX_end(). This way any higher-level function (and
perhaps the application) can specify this flag in its own
BN_CTX_start()/BN_CTX_end() pair, and this will cause inner-loop functions
specifying the flag to be ignored so that sanitisation is handled only once
back out at the higher level. I will be implementing this in the near
future.
Remove certain redundant BN_zero() initialisations, because BN_CTX_get(),
BN_init(), [etc] already initialise to zero.
Correct error checking in bn_sqr.c, and be less wishy-wash about how/why
the result's 'top' value is set (note also, 'max' is always > 0 at this
point).
the same thing.
Also, I have some stuff on the back-burner related to some BN_CTX notes
from Peter Gutmann about his cryptlib hacks to the bignum code. The BN_CTX
comments are there to remind me of some relevant points in the code.
One problem that looked like a problem in bn_recp.c at first turned
out to be a BN_mul bug. An example is given in bn_recp.c; finding
the bug responsible for this is left as an exercise.
two functions that did expansion on in parameters (BN_mul() and
BN_sqr()). The problem was solved by making bn_dup_expand() which is
a mix of bn_expand2() and BN_dup().
because we're only handling words anyway) in BN_mod_exp_mont_word
making it a little faster for very small exponents,
and adjust the performance gain estimate in CHANGES according
to slightly more thorough measurements.
(15% faster than BN_mod_exp_mont for "large" base,
20% faster than BN_mod_exp_mont for small base.)