and bn_add_words to avoid using fake bignums to window other bignums that
can lead to corruption. This change allows all bignum tests to pass with
BN_DEBUG and BN_DEBUG_RAND debugging and valgrind. NB: This should be
tested on a few different architectures and configuration targets, as the
bignum code this deals with is quite preprocessor (and assembly) sensitive.
Submitted by: Nils Narsch
Reviewed by: Geoff Thorpe, Ulf Moeller
constructing BIGNUM structures with pointers offset into other bignums
(among other things). This corrects some of it that is too plainly insane,
and tries to ensure that bignums are normalised when passed to other
functions.
sure they are available in opensslconf.h, by giving them names starting
with "OPENSSL_" to avoid conflicts with other packages and by making
sure e_os2.h will cover all platform-specific cases together with
opensslconf.h.
I've checked fairly well that nothing breaks with this (apart from
external software that will adapt if they have used something like
NO_KRB5), but I can't guarantee it completely, so a review of this
change would be a good thing.
crypto/bn/bn_lcl.h for further details. It should be noted that for
the moment of this writing the code was tested only on Alpha. If
compiled with DEC C the C implementation exhibits 12% performance
improvement over the crypto/bn/asm/alpha.s (on EV56 box running
AlphaLinux). GNU C is (unfortunately) 8% behind the assembler
implementation. But it's OpenVMS Alpha users who *may* benefit most
as 'apps/openssl speed rsa' exhibits 6 (six) times performance
improvement over the original VMS bignum implementation. Where "*may*"
means "as soon as code is enabled though #define SIXTY_FOUR_BIT and
crypto/bn/asm/vms.mar is skipped."
the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.
CPU BN_div bn_div_words overall comment
------------------------------------------------------------------------
PII +16% accumulated by +2-3% PII multiplies damn fast! Taking
inlining multiplication out of the loop
didn't make too much difference.
Eliminating of the multiplication
involved in remainder calculation
is the major factor.
Pentium +45% accumulated by +7-9% mull isn't that fast and replacing
inlining multiplications with additions in
the loop has more visible effect:-)
MIPS +75% +12% +20-25% In addition to the taking mults
R10000 out of the loop (giving 12% in the
asm/mips3.s) three mults were
eliminated in BN_div.
Alpha +30% +50% +10-15% Same as above. But remember that
EV4 bn_div_words is a C implementation.
It takes 4 Alpha mults in C to do
the same thing as 1 MIPS mult in
assembler does. So the effect (50%)
is more impressive. But not the
overall one... Well, if Alpha
bn_mul_add would be implemented
in assembler overall improvement
would be closer to MIPS...