the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.
CPU BN_div bn_div_words overall comment
------------------------------------------------------------------------
PII +16% accumulated by +2-3% PII multiplies damn fast! Taking
inlining multiplication out of the loop
didn't make too much difference.
Eliminating of the multiplication
involved in remainder calculation
is the major factor.
Pentium +45% accumulated by +7-9% mull isn't that fast and replacing
inlining multiplications with additions in
the loop has more visible effect:-)
MIPS +75% +12% +20-25% In addition to the taking mults
R10000 out of the loop (giving 12% in the
asm/mips3.s) three mults were
eliminated in BN_div.
Alpha +30% +50% +10-15% Same as above. But remember that
EV4 bn_div_words is a C implementation.
It takes 4 Alpha mults in C to do
the same thing as 1 MIPS mult in
assembler does. So the effect (50%)
is more impressive. But not the
overall one... Well, if Alpha
bn_mul_add would be implemented
in assembler overall improvement
would be closer to MIPS...
in cryptlib.h (which is often included as "../cryptlib.h"), then the
question remains relative to which directory this is to be interpreted.
gcc went one further directory up, as intended; but makedepend thinks
differently, and so probably do some C compilers. So the ../ must go away;
thus e_os.h goes back into include/openssl (but I now use
#include "openssl/e_os.h" instead of <openssl/e_os.h> to make the point) --
and we have another huge bunch of dependency changes. Argh.
There were problems with putting e_os.h just into the top directory,
because the test programs are compiled within test/ in the "standard"
case in in their original directories in the makefile.one case;
and in the latter symlinks may not be available.
used with negative char values, so I've added casts to unsigned char.
Maybe what really should be done is change all those arrays and
pointers to type unsigned char [] or unsigned char *, respectively;
but using plain char with those predicates is just wrong, so something
had to be done.
Submitted by:
Reviewed by:
PR: