The lazy-initialisation of BN_MONT_CTX was serialising all threads, as
noted by Daniel Sands and co at Sandia. This was to handle the case that
2 or more threads race to lazy-init the same context, but stunted all
scalability in the case where 2 or more threads are doing unrelated
things! We favour the latter case by punishing the former. The init work
gets done by each thread that finds the context to be uninitialised, and
we then lock the "set" logic after that work is done - the winning
thread's work gets used, the losing threads throw away what they've done.
Signed-off-by: Geoff Thorpe <geoff@openssl.org>
you need to use "enable-montasm" to see a difference. (Huge speed
advantage, but BN_MONT_CTX is not binary compatible, so this can't be
enabled by default in the 0.9.8 branch.)
The CHANGES entry also covers the 64-bit x86 backport in November 2007
by appro.
locally initialising their own.
NB: I've removed the "BN_clear_free()" loops for the exit-paths in some of
these functions, and that may be a major part of the performance
improvements we're seeing. The "free" part can be removed because we're
using BN_CTX. The "clear" part OTOH can be removed because BN_CTX
destruction automatically performs this task, so performing it inside
functions that may be called repeatedly is wasteful. This is currently safe
within openssl due to the fact that BN_CTX objects are never created for
longer than a single high-level operation. However, that is only because
there's currently no mechanism in openssl for thread-local storage. Beyond
that, this might be an issue for applications using the bignum API directly
and caching their own BN_CTX objects. The solution is to introduce a flag
to BN_CTX_start() that allows its variables to be automatically sanitised
on release during BN_CTX_end(). This way any higher-level function (and
perhaps the application) can specify this flag in its own
BN_CTX_start()/BN_CTX_end() pair, and this will cause inner-loop functions
specifying the flag to be ignored so that sanitisation is handled only once
back out at the higher level. I will be implementing this in the near
future.
Remove certain redundant BN_zero() initialisations, because BN_CTX_get(),
BN_init(), [etc] already initialise to zero.
Correct error checking in bn_sqr.c, and be less wishy-wash about how/why
the result's 'top' value is set (note also, 'max' is always > 0 at this
point).
One problem that looked like a problem in bn_recp.c at first turned
out to be a BN_mul bug. An example is given in bn_recp.c; finding
the bug responsible for this is left as an exercise.
two functions that did expansion on in parameters (BN_mul() and
BN_sqr()). The problem was solved by making bn_dup_expand() which is
a mix of bn_expand2() and BN_dup().
BN_mod_mul_montgomery, which calls bn_sqr_recursive
without much preparation.
bn_sqr_recursive requires the length of its argument to be
a power of 2, which is not always the case here.
There's no reason for not using BN_sqr -- if a simpler
approach to squaring made sense, then why not change
BN_sqr? (Using BN_sqr should also speed up DH where g is chosen
such that it becomes small [e.g., 2] when converted
to Montgomery representation.)
Case closed :-)
like Malloc, Realloc and especially Free conflict with already existing names
on some operating systems or other packages. That is reason enough to change
the names of the OpenSSL memory allocation macros to something that has a
better chance of being unique, like prepending them with OPENSSL_.
This change includes all the name changes needed throughout all C files.