Yet another question: some time ago you changed BN_set_word.
Why didn't you change BN_get_word as well?
Quite. I'm also removing the older commented-out implementations to improve
readability. This complex stuff seems to date from a time when the types
didn't match up well.
Submitted by: Nils Larsch, Geoff Thorpe
the only function that uses it) because it would trip up an assertion in
bn_div_words() when first invoked. This also adds BN_div_word() testing to
bntest.
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe
corresponding headers are only required for API functions or structure
details. This now includes the bignum types and BUF_MEM. Subsequent commits
will remove various dependencies on bn.h and buffer.h and update the
makefile dependencies.
especially for AIX. But most important BIGNUM assembler implementation
submitted by IBM.
Submitted by: Peter Waltenberg <pwalten@au1.ibm.com>
Reviewed by: appro
locally initialising their own.
NB: I've removed the "BN_clear_free()" loops for the exit-paths in some of
these functions, and that may be a major part of the performance
improvements we're seeing. The "free" part can be removed because we're
using BN_CTX. The "clear" part OTOH can be removed because BN_CTX
destruction automatically performs this task, so performing it inside
functions that may be called repeatedly is wasteful. This is currently safe
within openssl due to the fact that BN_CTX objects are never created for
longer than a single high-level operation. However, that is only because
there's currently no mechanism in openssl for thread-local storage. Beyond
that, this might be an issue for applications using the bignum API directly
and caching their own BN_CTX objects. The solution is to introduce a flag
to BN_CTX_start() that allows its variables to be automatically sanitised
on release during BN_CTX_end(). This way any higher-level function (and
perhaps the application) can specify this flag in its own
BN_CTX_start()/BN_CTX_end() pair, and this will cause inner-loop functions
specifying the flag to be ignored so that sanitisation is handled only once
back out at the higher level. I will be implementing this in the near
future.
little TODO list in there as well as the debugging code (only enabled if
BN_CTX_DEBUG is defined).
I'd appreciate as much review and testing as can be spared for this. I'll
commit some changes to other parts of the bignum code shortly to make
better use of this implementation (no more fixed size limitations). Note
also that under identical optimisations, I'm seeing a noticable speed
increase over openssl-0.9.7 - so any feedback to confirm/deny this on other
systems would also be most welcome.
- Remove some unnecessary "+1"-like fudges. Sizes should be handled
exactly, as enlarging size parameters causes needless bloat and may just
make bugs less likely rather than fixing them: bn_expand() macro,
bn_expand_internal(), and BN_sqr().
- Deprecate bn_dup_expand() - it's new since 0.9.7, unused, and not that
useful.
- Remove unnecessary zeroing of unused bytes in bn_expand2().
- Rewrite BN_set_word() - it should be much simpler, the previous
complexities probably date from old mismatched type issues.
- Add missing bn_check_top() macros in bn_word.c
- Improve some degenerate case handling in BN_[add|sub]_word(), add
comments, and avoid a bignum expansion if an overflow isn't possible.
Remove certain redundant BN_zero() initialisations, because BN_CTX_get(),
BN_init(), [etc] already initialise to zero.
Correct error checking in bn_sqr.c, and be less wishy-wash about how/why
the result's 'top' value is set (note also, 'max' is always > 0 at this
point).
bignums are passed in and out of functions and APIs in a consistent form
has highlighted that zero-valued bignums don't need any allocated word
data. The use of BN_set_word() to initialise a bignum to zero causes
needless allocation and gives it a return value that must be checked. This
change converts BN_zero() to a self-contained macro that has no
return/expression value and does not cause any expansion of bignum data.
Note, it would be tempting to rewrite the deprecated version as a
success-valued comma expression, such as;
#define BN_zero(a) ((a)->top = (a)->neg = 0, 1)
However, this evaluates 'a' twice and would confuse initialisation loops
(eg. while(..) { BN_zero(bn++) } ). As such, the deprecated version
continues to use BN_set_word().
redefine bn_clear_top2max() to be a NOP in the non-debugging case, and
remove some unnecessary usages in bn_nist.c.
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe, Ulf Möller
return a "zero" bignum as BN_new() does - so reset 'top'. During
BN_CTX_end(), released bignums should be consistent so enforce this in
debug builds. Also, reduce the number of wasted BN_clear_free() calls from
BN_CTX_end() (typically by 75% or so).
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe, Ulf Möller
automatically to accomodate the value, some compilers fail to do so. Most
notably 0x0123456789ABCDEF should come out as long long in 32-bit context,
but HP compiler truncates it to 32-bit value. Which in turn breaks GF(2^m)
arithmetics in hpux-parisc2-cc build. Therefore this fix...
Use BUF_strlcat() instead of strcat().
Use BIO_snprintf() instead of sprintf().
In some cases, keep better track of buffer lengths.
This is part of a large change submitted by Markus Friedl <markus@openbsd.org>
- Add missing bn_check_top() calls and relocate some others
- Use BN_is_zero() where appropriate
- Remove assert()s that bn_check_top() is already covering
- Simplify the code in places (esp. bn_expand2())
- Only keep ambiguous zero handling if BN_STRICT isn't defined
- Remove some white-space and make some other aesthetic tweaks
the same thing.
Also, I have some stuff on the back-burner related to some BN_CTX notes
from Peter Gutmann about his cryptlib hacks to the bignum code. The BN_CTX
comments are there to remind me of some relevant points in the code.
once in the source (where it is set for the benefit of no other code
whatsoever). I've deprecated the declaration in the header and likewise
made the use of the flag conditional in bn_lib.c. Note, this change also
NULLs the 'd' pointer in a BIGNUM when it is reset but not deallocated.
which, in turn, are used nowhere at all. This is a good thing because
bn_set_max() would currently generate code that wouldn't compile (BIGNUM
has no 'max' element).
The only apparent use for bn_set_[low|high] would be for implementing
windowing algorithms, and all of openssl's seem to use bn_***_words()
helpers instead (including the BN_div() that Nils fixed recently, which had
been using independently-coded versions of what these unused macros are
intended for). I'm therefore consigning these macros to cvs oblivion in the
name of readability.
bn_correct_top() or bn_check_top() depending on debug settings. For
internal source, all bn_fix_top()s should be converted one way or the other
depending on whether the use of bn_correct_top() is justified.
For BN_div_recp(), these cases should not require correction if the other
bignum functions are doing their jobs properly, so convert to
bn_check_top().
(ie. where top may be zero, or it may be one if the corresponding word is
set to zero). Note, this only affects the macros in bn.h, there are probably
similar corrections required in some c files.
Also, clarify the audit-related macros at the top of the header. Mental
note: I must not forget to clean all this out before 0.9.8 is released ...
against inconsistent BIGNUMs coming out of any of its API functions. So
this change no longer "fixes" the bn_print.c functions, but it makes for
cleaner code. This patch was a part of ticket 697.
PR: 697
Submitted by: Otto Moerbeek
Reviewed by: Geoff Thorpe
ticket 697 (though uses a different solution than the proposed one). This
problem was initially raised by Otto Moerbeek.
PR: 697
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe
and bn_add_words to avoid using fake bignums to window other bignums that
can lead to corruption. This change allows all bignum tests to pass with
BN_DEBUG and BN_DEBUG_RAND debugging and valgrind. NB: This should be
tested on a few different architectures and configuration targets, as the
bignum code this deals with is quite preprocessor (and assembly) sensitive.
Submitted by: Nils Narsch
Reviewed by: Geoff Thorpe, Ulf Moeller
generally a more efficient comparison than comparing two integers, and the
first of these two loops was off-by-one (copying one too many values). This
change also removes a superfluous assignment that would set an unused word
to zero (and potentially allow an overrun in some cases).
Submitted by: Nils Larsch
Reviewed by: Geoff Thorpe
constant BIGNUMs. It turns out that this trips up different but equally
useful compiler warnings to -Wcast-qual, and so wasn't worth the ugliness
it created. (Thanks to Ulf for the forehead-slap.)
and structures as constant without having to cast away const at any point.
There is still plenty of other code that makes gcc's "-Wcast-qual" unhappy,
but crypto/bn/ is now ok. Purists are welcome to suggest alternatives.
constructing BIGNUM structures with pointers offset into other bignums
(among other things). This corrects some of it that is too plainly insane,
and tries to ensure that bignums are normalised when passed to other
functions.
structures being passed in to or out of API functions, and this corrects a
couple of cases found so far.
Also, lop off a couple of bytes of white-space.
I have tried to convert 'len' type variable declarations to unsigned as a
means to address these warnings when appropriate, but when in doubt I have
used casts in the comparisons instead. The better solution (that would get
us all lynched by API users) would be to go through and convert all the
function prototypes and structure definitions to use unsigned variables
except when signed is necessary. The proliferation of (signed) "int" for
strictly non-negative uses is unfortunate.
except internally to the allocator BN_CTX_new(), as such this deprecates
the use of BN_CTX_init() in the API. Moreover, the structure definition of
BN_CTX is taken out of bn_lcl.h and moved into bn_ctx.c itself.
NDEBUG should probably only be "forced" in the top-level configuration, but
until it is I will avoid removing it from bn_ctx.c which might surprise
people with massive slow-downs in their keygens. So I've left it in
bn_ctx.c but tidied up the preprocessor logic a touch and made it more
tolerant of debugging efforts.
be) precompiled out in the API headers. This change is to ensure that if
it is defined when compiling openssl, the deprecated functions aren't
implemented either.
sure the loop does correctly stop and breaking ("division by zero")
modulus operations are not performed. The (pre-generated) prime
table crypto/bn/bn_prime.h was already correct, but it could not be
re-generated on some platforms because of the "division by zero"
situation in the script.
replaced #if logic around bn_sub_part_words in bn_mul.c. I rely upon
OPENSSL_BN_ASM_PART_WORDS being added by ./Configure script. Would it
still work on non-Unix platforms?
- a patch to fix a memory leak in rsa_gen.c
- a note about compiler warnings with unions
- a note about improving structure element names
This applies his patch and implements a solution to the notes.
key-generation and prime-checking functions. Rather than explicitly passing
callback functions and caller-defined context data for the callbacks, a new
structure BN_GENCB is defined that encapsulates this; a pointer to the
structure is passed to all such functions instead.
This wrapper structure allows the encapsulation of "old" and "new" style
callbacks - "new" callbacks return a boolean result on the understanding
that returning FALSE should terminate keygen/primality processing. The
BN_GENCB abstraction will allow future callback modifications without
needing to break binary compatibility nor change the API function
prototypes. The new API functions have been given names ending in "_ex" and
the old functions are implemented as wrappers to the new ones. The
OPENSSL_NO_DEPRECATED symbol has been introduced so that, if defined,
declaration of the older functions will be skipped. NB: Some
openssl-internal code will stick with the older callbacks for now, so
appropriate "#undef" logic will be put in place - this is in case the user
is *building* openssl (rather than *including* its headers) with this
symbol defined.
There is another change in the new _ex functions; the key-generation
functions do not return key structures but operate on structures passed by
the caller, the return value is a boolean. This will allow for a smoother
transition to having key-generation as "virtual function" in the various
***_METHOD tables.
the divisor was a bit more complex than I first saw. The lost bit
can't just be discarded, as there are cases where it is important.
For example, look at dividing 320000 with 80000 vs. 80001 (all
decimals), the difference is crucial. The trick here is to check if
that lost bit was 1, and in that case, do the following:
1. subtract the quotient from the remainder
2. as long as the remainder is negative, add the divisor (the whole
divisor, not the shofted down copy) to it, and decrease the
quotient by one.
There's probably a nice mathematical proof for this already, but I
won't bother with that, unless someone requests it from me.
PR: 338
Here's the description, submitted by Gisle Vanem <giva@bgnett.no>:
1. sock_init() renamed to ssl_sock_init() in ./apps/s_socket.c due
to name-clash with Watt-32.
2. rand() renamed to Rand() in ./crypto/bn/divtest.c due to name-clash
with <stdlib.h>
3. Added calls to dbug_init()/sock_init() in some demo programs.
4. Changed cflags/lflags in configure. Watt-32 install root now taken
from $WATT_ROOT.
the new method names where _GF... suffixes have been removed.
Revert changes to ..._{get/set}_Jprojective_coordinates_...:
The current implementation for ECC over binary fields does not use
projective coordinates, and if it did, it would not use Jacobian
projective coordinates; so it's OK to use the ..._GFp prefix for all
this.
Add author attributions to some files so that it doesn't look
as if Sun wrote all of this :-)
The 'OPENSSL_NO_SUN_DIV' default is still subject to change,
so I didn't bother to finish the CHANGES entry yet.
Submitted by: Douglas Stebila <douglas.stebila@sun.com>, Sheueling Chang <sheueling.chang@sun.com>
(CHANGES entry by Bodo Moeller)
happens reliably, even if the BIGNUM is already sufficiently large.
[Note that the bn_expand()/bn_wexpand() macros call bn_expand2() only
if the BIGNUM actually has to grow, so this change does not add any
new overhead as currently bn_expand2() is never called directly.]
HP-UX in common in ./config). Note that for the moment of this writing
none of 64-bit platforms pass bntest. I'm committing this anyway as it's
too frustrating to patch snapshots over and over while 0.9.6 is known to
work.
details (performance numbers and accompanying discussions:-). Note that
the code is not engaged in ./Configure yet. I'll add it later this week
along with updates for .spec file.
Submitted by:
Reviewed by:
PR:
It does not appear to be faster than the current Montgomery code
except for very small moduli (somewhere between 192 and 224 bits
in a 64-bit Sun environment, and even less than 192 bits
on 32 bit systems).
and make all files the depend on it include it without prefixing it
with openssl/.
This means that all Makefiles will have $(TOP) as one of the include
directories.
sure they are available in opensslconf.h, by giving them names starting
with "OPENSSL_" to avoid conflicts with other packages and by making
sure e_os2.h will cover all platform-specific cases together with
opensslconf.h.
I've checked fairly well that nothing breaks with this (apart from
external software that will adapt if they have used something like
NO_KRB5), but I can't guarantee it completely, so a review of this
change would be a good thing.
Bleichenbacher's DSA attack. With this implementation, the expected
number of iterations never exceeds 2.
New semantics for BN_rand_range():
BN_rand_range(r, min, range) now generates r such that
min <= r < min+range.
(Previously, BN_rand_range(r, min, max) generated r such that
min <= r < max.
It is more convenient to have the range; also the previous
prototype was misleading because max was larger than
the actual maximum.)
for its ASN1 operations as well as the old style function
pointers (i2d, d2i, new, free). Change standard extensions
to support this.
Fix a warning in BN_mul(), bn_mul.c about uninitialised 'j'.
One problem that looked like a problem in bn_recp.c at first turned
out to be a BN_mul bug. An example is given in bn_recp.c; finding
the bug responsible for this is left as an exercise.
course, that means we need to handle the cases where the two arrays to
bn_mul_recursive() and bn_mul_part_recursive() differ in size.
I haven't yet changed the comments that describe bn_mul_recursive()
and bn_mul_part_recursive(). I want this to be tested by more people
before I consider this change final. Please test away!
The bn_cmp_part_words bug was only caught in the BN_mod_mul() test,
not in the BN_mul() test, so apparently the choice of parameters in
some cases is bad.
But even if this is avoided, there are still segmentation violations
(during one of the BN_free()s at the end of test_kron
in some cases, in other cases during BN_kronecker, or
later in BN_sqrt; choosing a different exponentiation
algorithm in bntest.c appears to influence when the SIGSEGV
takes place).
so we have to reduce the random numbers used in test_mont.
Before this change, test_mont failed in [debug-]solaris-sparcv9-gcc
configurations ("Montgomery multiplication test failed!" because
the multiplication result obtained with Montgomery multiplication
differed from the result obtained by BN_mod_mul).
Substituing the old version of bn_gcd.c (BN_mod_inverse) did not avoid
the problem.
The strange thing is that it I did not observe any problems
when using debug-solaris-sparcv8-gcc and solaris-sparcv9-cc,
as well as when compiling OpenSSL 0.9.6 in the solaric-sparcv9-gcc
configuration on the same system.
This caused a segmentation fault in calls to malloc, so I cleaned up
bn_lib.c a little so that it is easier to see what is going on.
The bug turned out to be an off-by-one error in BN_bin2bn.
These new files will not be included literally in OpenSSL, but I intend
to integrate most of their contents. Most file names will change,
and when the integration is done, the superfluous files will be deleted.
Submitted by: Lenka Fibikova <fibikova@exp-math.uni-essen.de>
I'm a little bit nervous about bn_div_words, as I don't know what it's
supposed to return on overflow. For now, I trust the rest of the
system to give it numbers that will not cause any overflow...
BN_mul() correctly constified, avoids two realloc()'s that aren't
really necessary and saves memory to boot. This required a small
change in bn_mul_part_recursive() and the addition of variants of
bn_cmp_words(), bn_add_words() and bn_sub_words() that can take arrays
with differing sizes.
The test results show a performance that very closely matches the
original code from before my constification. This may seem like a
very small win from a performance point of view, but if one remembers
that the variants of bn_cmp_words(), bn_add_words() and bn_sub_words()
are not at all optimized for the moment (and there's no corresponding
assembler code), and that their use may be just as non-optimal, I'm
pretty confident there are possibilities...
This code needs reviewing!
two functions that did expansion on in parameters (BN_mul() and
BN_sqr()). The problem was solved by making bn_dup_expand() which is
a mix of bn_expand2() and BN_dup().
BN_mod_mul_montgomery, which calls bn_sqr_recursive
without much preparation.
bn_sqr_recursive requires the length of its argument to be
a power of 2, which is not always the case here.
There's no reason for not using BN_sqr -- if a simpler
approach to squaring made sense, then why not change
BN_sqr? (Using BN_sqr should also speed up DH where g is chosen
such that it becomes small [e.g., 2] when converted
to Montgomery representation.)
Case closed :-)
Also, "make update" has added some missing functions to libeay.num,
updated the TABLE for the alpha changes, and updated thousands of
dependancies that have changed from recent commits.
because we're only handling words anyway) in BN_mod_exp_mont_word
making it a little faster for very small exponents,
and adjust the performance gain estimate in CHANGES according
to slightly more thorough measurements.
(15% faster than BN_mod_exp_mont for "large" base,
20% faster than BN_mod_exp_mont for small base.)
like Malloc, Realloc and especially Free conflict with already existing names
on some operating systems or other packages. That is reason enough to change
the names of the OpenSSL memory allocation macros to something that has a
better chance of being unique, like prepending them with OPENSSL_.
This change includes all the name changes needed throughout all C files.
Remove one ampersand so the compiler may complain less.
Make rand() static so it will not conflict with the C RTL.
Make bug() static too, for good measure.
Apparently BN_div_recp reports an error for small divisors
(1,2,4,8,40).
I haven't got mismatches so far. If you can, please run the test
program for a few days (nohup divtest >out& or something), and if it
reports a mismatch, post the output.
X is 5120 on 32-bit and 151552 on 64-bit architectures and I varies
from 0 to 4. As result the test was *unreasonably* slow and virtually
impossible to complete on 64-bit architectures (e.g. IRIX bc couldn't
even swallow such long lines).
crypto/bn/bn_lcl.h for further details. It should be noted that for
the moment of this writing the code was tested only on Alpha. If
compiled with DEC C the C implementation exhibits 12% performance
improvement over the crypto/bn/asm/alpha.s (on EV56 box running
AlphaLinux). GNU C is (unfortunately) 8% behind the assembler
implementation. But it's OpenVMS Alpha users who *may* benefit most
as 'apps/openssl speed rsa' exhibits 6 (six) times performance
improvement over the original VMS bignum implementation. Where "*may*"
means "as soon as code is enabled though #define SIXTY_FOUR_BIT and
crypto/bn/asm/vms.mar is skipped."
because otherwise BN_rand will fail unless DEVRANDOM works,
which causes the programs to dump core because they
don't check the return value of BN_rand (and if they
did, we still couldn't test anything).
- add comment to some files that appear not to be used at all.
returns int (1 = ok, 0 = not seeded). New function RAND_add() is the
same as RAND_seed() but takes an estimate of the entropy as an additional
argument.
problem was that one of the replacement routines had not been working since
SSLeay releases. For now the offending routine has been replaced with
non-optimised assembler. Even so, this now gives around 95% performance
improvement for 1024 bit RSA signs.
the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.
CPU BN_div bn_div_words overall comment
------------------------------------------------------------------------
PII +16% accumulated by +2-3% PII multiplies damn fast! Taking
inlining multiplication out of the loop
didn't make too much difference.
Eliminating of the multiplication
involved in remainder calculation
is the major factor.
Pentium +45% accumulated by +7-9% mull isn't that fast and replacing
inlining multiplications with additions in
the loop has more visible effect:-)
MIPS +75% +12% +20-25% In addition to the taking mults
R10000 out of the loop (giving 12% in the
asm/mips3.s) three mults were
eliminated in BN_div.
Alpha +30% +50% +10-15% Same as above. But remember that
EV4 bn_div_words is a C implementation.
It takes 4 Alpha mults in C to do
the same thing as 1 MIPS mult in
assembler does. So the effect (50%)
is more impressive. But not the
overall one... Well, if Alpha
bn_mul_add would be implemented
in assembler overall improvement
would be closer to MIPS...