crypto/bn/bn_lcl.h for further details. It should be noted that for
the moment of this writing the code was tested only on Alpha. If
compiled with DEC C the C implementation exhibits 12% performance
improvement over the crypto/bn/asm/alpha.s (on EV56 box running
AlphaLinux). GNU C is (unfortunately) 8% behind the assembler
implementation. But it's OpenVMS Alpha users who *may* benefit most
as 'apps/openssl speed rsa' exhibits 6 (six) times performance
improvement over the original VMS bignum implementation. Where "*may*"
means "as soon as code is enabled though #define SIXTY_FOUR_BIT and
crypto/bn/asm/vms.mar is skipped."
because otherwise BN_rand will fail unless DEVRANDOM works,
which causes the programs to dump core because they
don't check the return value of BN_rand (and if they
did, we still couldn't test anything).
- add comment to some files that appear not to be used at all.
returns int (1 = ok, 0 = not seeded). New function RAND_add() is the
same as RAND_seed() but takes an estimate of the entropy as an additional
argument.
problem was that one of the replacement routines had not been working since
SSLeay releases. For now the offending routine has been replaced with
non-optimised assembler. Even so, this now gives around 95% performance
improvement for 1024 bit RSA signs.
the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.
CPU BN_div bn_div_words overall comment
------------------------------------------------------------------------
PII +16% accumulated by +2-3% PII multiplies damn fast! Taking
inlining multiplication out of the loop
didn't make too much difference.
Eliminating of the multiplication
involved in remainder calculation
is the major factor.
Pentium +45% accumulated by +7-9% mull isn't that fast and replacing
inlining multiplications with additions in
the loop has more visible effect:-)
MIPS +75% +12% +20-25% In addition to the taking mults
R10000 out of the loop (giving 12% in the
asm/mips3.s) three mults were
eliminated in BN_div.
Alpha +30% +50% +10-15% Same as above. But remember that
EV4 bn_div_words is a C implementation.
It takes 4 Alpha mults in C to do
the same thing as 1 MIPS mult in
assembler does. So the effect (50%)
is more impressive. But not the
overall one... Well, if Alpha
bn_mul_add would be implemented
in assembler overall improvement
would be closer to MIPS...