x86_64 platform. It targets specifically RSA1024 sign (using ideas
from http://eprint.iacr.org/2011/239) and adds more than 10% on most
platforms. Overall performance improvement relative to 1.0.0 is ~40%
in average, with best result of 54% on Westmere. Incidentally ~40%
is average improvement even for longer key lengths.
Exact improvement coefficients vary from one benchmark and platform to
another, e.g. it performs 70%-33% better on ARM, hereafter less for
longer keys, and 100%-90% better on x86_64.
level profiling data resulted in almost 50% performance improvement.
PA-RISC 1.1 is also reordered in same manner, mostly to be consistent,
as no gain was observed, not on PA-7100LC.
knock-on work than expected - they've been extracted into a patch
series that can be completed elsewhere, or in a different branch,
before merging back to HEAD.
deprecate the original (numeric-only) scheme, and replace with the
CRYPTO_THREADID object. This hides the platform-specifics and should reduce
the possibility for programming errors (where failing to explicitly check
both thread ID forms could create subtle, platform-specific bugs).
Thanks to Bodo, for invaluable review and feedback.
So in a sense this commit reverts few latest ones fixing bugs in original
code and improving it, most notably adding 64-bit support [though not in
BN_nist_mod_224 yet].
PR: 1593
to 'unsigned long' (ie. odd platforms/compilers), so a pointer-typed
version was added but it required portable code to check *both* modes to
determine equality. This commit maintains the availability of both thread
ID types, but deprecates the type-specific accessor APIs that invoke the
callbacks - instead a single type-independent API is used. This simplifies
software that calls into this interface, and should also make it less
error-prone - as forgetting to call and compare *both* thread ID accessors
could have led to hard-to-debug/infrequent bugs (that might only affect
certain platforms or thread implementations). As the CHANGES note says,
there were corresponding deprecations and replacements in the
thread-related functions for BN_BLINDING and ERR too.
CRYPTO_get_idptr_callback(), CRYPTO_thread_idptr() for a 'void *' type
thread ID, since the 'unsigned long' type of the existing thread ID
does not always work well.
have impact on performance, because amount of multiplications does not
increase with this switch, not on sparcv9 that is. On the contrary, it
actually improves performance, because it spares a load of instructions
used to chase carries. Not to mention that BN assembler modules can be
shared more freely between 32- and 64-bit builts.
timing attacks.
BN_FLG_EXP_CONSTTIME requests this algorithm, and this done by default for
RSA/DSA/DH private key computations unless
RSA_FLAG_NO_EXP_CONSTTIME/DSA_FLAG_NO_EXP_CONSTTIME/
DH_FLAG_NO_EXP_CONSTTIME is set.
Submitted by: Matthew D Wood
Reviewed by: Bodo Moeller
- possibility of re-creation of the blinding parameters after a
fixed number of uses (suggested by Bodo)
- calculatition of the rsa::e in case it's absent and p and q
are present (see bug report #785)
- improve the performance when if one rsa structure is shared by
more than a thread (see bug report #555)
- fix the problem described in bug report #827
- hide the definition ot the BN_BLINDING structure in bn_blind.c
COFF and a.out targets [similar to ELF targets]. You might notice some
rudementary support for shared mingw builds under cygwin. It works (it
produces cryptoeay32.dll and ssleay32.dll with everything exported by
name), but it's primarily for testing/debugging purposes, at least for
now...
- Remove unused and unuseful debug cruft.
- Remove unnecessary 'top' fudging from BN_copy().
- Fix a potential memory leak and simplify the expansion logic in
BN_bin2bn().
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe
Yet another question: some time ago you changed BN_set_word.
Why didn't you change BN_get_word as well?
Quite. I'm also removing the older commented-out implementations to improve
readability. This complex stuff seems to date from a time when the types
didn't match up well.
Submitted by: Nils Larsch, Geoff Thorpe
the only function that uses it) because it would trip up an assertion in
bn_div_words() when first invoked. This also adds BN_div_word() testing to
bntest.
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe
corresponding headers are only required for API functions or structure
details. This now includes the bignum types and BUF_MEM. Subsequent commits
will remove various dependencies on bn.h and buffer.h and update the
makefile dependencies.
especially for AIX. But most important BIGNUM assembler implementation
submitted by IBM.
Submitted by: Peter Waltenberg <pwalten@au1.ibm.com>
Reviewed by: appro
locally initialising their own.
NB: I've removed the "BN_clear_free()" loops for the exit-paths in some of
these functions, and that may be a major part of the performance
improvements we're seeing. The "free" part can be removed because we're
using BN_CTX. The "clear" part OTOH can be removed because BN_CTX
destruction automatically performs this task, so performing it inside
functions that may be called repeatedly is wasteful. This is currently safe
within openssl due to the fact that BN_CTX objects are never created for
longer than a single high-level operation. However, that is only because
there's currently no mechanism in openssl for thread-local storage. Beyond
that, this might be an issue for applications using the bignum API directly
and caching their own BN_CTX objects. The solution is to introduce a flag
to BN_CTX_start() that allows its variables to be automatically sanitised
on release during BN_CTX_end(). This way any higher-level function (and
perhaps the application) can specify this flag in its own
BN_CTX_start()/BN_CTX_end() pair, and this will cause inner-loop functions
specifying the flag to be ignored so that sanitisation is handled only once
back out at the higher level. I will be implementing this in the near
future.
little TODO list in there as well as the debugging code (only enabled if
BN_CTX_DEBUG is defined).
I'd appreciate as much review and testing as can be spared for this. I'll
commit some changes to other parts of the bignum code shortly to make
better use of this implementation (no more fixed size limitations). Note
also that under identical optimisations, I'm seeing a noticable speed
increase over openssl-0.9.7 - so any feedback to confirm/deny this on other
systems would also be most welcome.
- Remove some unnecessary "+1"-like fudges. Sizes should be handled
exactly, as enlarging size parameters causes needless bloat and may just
make bugs less likely rather than fixing them: bn_expand() macro,
bn_expand_internal(), and BN_sqr().
- Deprecate bn_dup_expand() - it's new since 0.9.7, unused, and not that
useful.
- Remove unnecessary zeroing of unused bytes in bn_expand2().
- Rewrite BN_set_word() - it should be much simpler, the previous
complexities probably date from old mismatched type issues.
- Add missing bn_check_top() macros in bn_word.c
- Improve some degenerate case handling in BN_[add|sub]_word(), add
comments, and avoid a bignum expansion if an overflow isn't possible.
Remove certain redundant BN_zero() initialisations, because BN_CTX_get(),
BN_init(), [etc] already initialise to zero.
Correct error checking in bn_sqr.c, and be less wishy-wash about how/why
the result's 'top' value is set (note also, 'max' is always > 0 at this
point).
bignums are passed in and out of functions and APIs in a consistent form
has highlighted that zero-valued bignums don't need any allocated word
data. The use of BN_set_word() to initialise a bignum to zero causes
needless allocation and gives it a return value that must be checked. This
change converts BN_zero() to a self-contained macro that has no
return/expression value and does not cause any expansion of bignum data.
Note, it would be tempting to rewrite the deprecated version as a
success-valued comma expression, such as;
#define BN_zero(a) ((a)->top = (a)->neg = 0, 1)
However, this evaluates 'a' twice and would confuse initialisation loops
(eg. while(..) { BN_zero(bn++) } ). As such, the deprecated version
continues to use BN_set_word().
redefine bn_clear_top2max() to be a NOP in the non-debugging case, and
remove some unnecessary usages in bn_nist.c.
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe, Ulf Möller
return a "zero" bignum as BN_new() does - so reset 'top'. During
BN_CTX_end(), released bignums should be consistent so enforce this in
debug builds. Also, reduce the number of wasted BN_clear_free() calls from
BN_CTX_end() (typically by 75% or so).
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe, Ulf Möller
automatically to accomodate the value, some compilers fail to do so. Most
notably 0x0123456789ABCDEF should come out as long long in 32-bit context,
but HP compiler truncates it to 32-bit value. Which in turn breaks GF(2^m)
arithmetics in hpux-parisc2-cc build. Therefore this fix...
Use BUF_strlcat() instead of strcat().
Use BIO_snprintf() instead of sprintf().
In some cases, keep better track of buffer lengths.
This is part of a large change submitted by Markus Friedl <markus@openbsd.org>
- Add missing bn_check_top() calls and relocate some others
- Use BN_is_zero() where appropriate
- Remove assert()s that bn_check_top() is already covering
- Simplify the code in places (esp. bn_expand2())
- Only keep ambiguous zero handling if BN_STRICT isn't defined
- Remove some white-space and make some other aesthetic tweaks
the same thing.
Also, I have some stuff on the back-burner related to some BN_CTX notes
from Peter Gutmann about his cryptlib hacks to the bignum code. The BN_CTX
comments are there to remind me of some relevant points in the code.
once in the source (where it is set for the benefit of no other code
whatsoever). I've deprecated the declaration in the header and likewise
made the use of the flag conditional in bn_lib.c. Note, this change also
NULLs the 'd' pointer in a BIGNUM when it is reset but not deallocated.
which, in turn, are used nowhere at all. This is a good thing because
bn_set_max() would currently generate code that wouldn't compile (BIGNUM
has no 'max' element).
The only apparent use for bn_set_[low|high] would be for implementing
windowing algorithms, and all of openssl's seem to use bn_***_words()
helpers instead (including the BN_div() that Nils fixed recently, which had
been using independently-coded versions of what these unused macros are
intended for). I'm therefore consigning these macros to cvs oblivion in the
name of readability.
bn_correct_top() or bn_check_top() depending on debug settings. For
internal source, all bn_fix_top()s should be converted one way or the other
depending on whether the use of bn_correct_top() is justified.
For BN_div_recp(), these cases should not require correction if the other
bignum functions are doing their jobs properly, so convert to
bn_check_top().
(ie. where top may be zero, or it may be one if the corresponding word is
set to zero). Note, this only affects the macros in bn.h, there are probably
similar corrections required in some c files.
Also, clarify the audit-related macros at the top of the header. Mental
note: I must not forget to clean all this out before 0.9.8 is released ...
against inconsistent BIGNUMs coming out of any of its API functions. So
this change no longer "fixes" the bn_print.c functions, but it makes for
cleaner code. This patch was a part of ticket 697.
PR: 697
Submitted by: Otto Moerbeek
Reviewed by: Geoff Thorpe
ticket 697 (though uses a different solution than the proposed one). This
problem was initially raised by Otto Moerbeek.
PR: 697
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe
and bn_add_words to avoid using fake bignums to window other bignums that
can lead to corruption. This change allows all bignum tests to pass with
BN_DEBUG and BN_DEBUG_RAND debugging and valgrind. NB: This should be
tested on a few different architectures and configuration targets, as the
bignum code this deals with is quite preprocessor (and assembly) sensitive.
Submitted by: Nils Narsch
Reviewed by: Geoff Thorpe, Ulf Moeller
generally a more efficient comparison than comparing two integers, and the
first of these two loops was off-by-one (copying one too many values). This
change also removes a superfluous assignment that would set an unused word
to zero (and potentially allow an overrun in some cases).
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe
constant BIGNUMs. It turns out that this trips up different but equally
useful compiler warnings to -Wcast-qual, and so wasn't worth the ugliness
it created. (Thanks to Ulf for the forehead-slap.)
and structures as constant without having to cast away const at any point.
There is still plenty of other code that makes gcc's "-Wcast-qual" unhappy,
but crypto/bn/ is now ok. Purists are welcome to suggest alternatives.
constructing BIGNUM structures with pointers offset into other bignums
(among other things). This corrects some of it that is too plainly insane,
and tries to ensure that bignums are normalised when passed to other
functions.
structures being passed in to or out of API functions, and this corrects a
couple of cases found so far.
Also, lop off a couple of bytes of white-space.
I have tried to convert 'len' type variable declarations to unsigned as a
means to address these warnings when appropriate, but when in doubt I have
used casts in the comparisons instead. The better solution (that would get
us all lynched by API users) would be to go through and convert all the
function prototypes and structure definitions to use unsigned variables
except when signed is necessary. The proliferation of (signed) "int" for
strictly non-negative uses is unfortunate.
except internally to the allocator BN_CTX_new(), as such this deprecates
the use of BN_CTX_init() in the API. Moreover, the structure definition of
BN_CTX is taken out of bn_lcl.h and moved into bn_ctx.c itself.
NDEBUG should probably only be "forced" in the top-level configuration, but
until it is I will avoid removing it from bn_ctx.c which might surprise
people with massive slow-downs in their keygens. So I've left it in
bn_ctx.c but tidied up the preprocessor logic a touch and made it more
tolerant of debugging efforts.
be) precompiled out in the API headers. This change is to ensure that if
it is defined when compiling openssl, the deprecated functions aren't
implemented either.
sure the loop does correctly stop and breaking ("division by zero")
modulus operations are not performed. The (pre-generated) prime
table crypto/bn/bn_prime.h was already correct, but it could not be
re-generated on some platforms because of the "division by zero"
situation in the script.
replaced #if logic around bn_sub_part_words in bn_mul.c. I rely upon
OPENSSL_BN_ASM_PART_WORDS being added by ./Configure script. Would it
still work on non-Unix platforms?
- a patch to fix a memory leak in rsa_gen.c
- a note about compiler warnings with unions
- a note about improving structure element names
This applies his patch and implements a solution to the notes.
key-generation and prime-checking functions. Rather than explicitly passing
callback functions and caller-defined context data for the callbacks, a new
structure BN_GENCB is defined that encapsulates this; a pointer to the
structure is passed to all such functions instead.
This wrapper structure allows the encapsulation of "old" and "new" style
callbacks - "new" callbacks return a boolean result on the understanding
that returning FALSE should terminate keygen/primality processing. The
BN_GENCB abstraction will allow future callback modifications without
needing to break binary compatibility nor change the API function
prototypes. The new API functions have been given names ending in "_ex" and
the old functions are implemented as wrappers to the new ones. The
OPENSSL_NO_DEPRECATED symbol has been introduced so that, if defined,
declaration of the older functions will be skipped. NB: Some
openssl-internal code will stick with the older callbacks for now, so
appropriate "#undef" logic will be put in place - this is in case the user
is *building* openssl (rather than *including* its headers) with this
symbol defined.
There is another change in the new _ex functions; the key-generation
functions do not return key structures but operate on structures passed by
the caller, the return value is a boolean. This will allow for a smoother
transition to having key-generation as "virtual function" in the various
***_METHOD tables.
the divisor was a bit more complex than I first saw. The lost bit
can't just be discarded, as there are cases where it is important.
For example, look at dividing 320000 with 80000 vs. 80001 (all
decimals), the difference is crucial. The trick here is to check if
that lost bit was 1, and in that case, do the following:
1. subtract the quotient from the remainder
2. as long as the remainder is negative, add the divisor (the whole
divisor, not the shofted down copy) to it, and decrease the
quotient by one.
There's probably a nice mathematical proof for this already, but I
won't bother with that, unless someone requests it from me.