I'm a little bit nervous about bn_div_words, as I don't know what it's
supposed to return on overflow. For now, I trust the rest of the
system to give it numbers that will not cause any overflow...
BN_mul() correctly constified, avoids two realloc()'s that aren't
really necessary and saves memory to boot. This required a small
change in bn_mul_part_recursive() and the addition of variants of
bn_cmp_words(), bn_add_words() and bn_sub_words() that can take arrays
with differing sizes.
The test results show a performance that very closely matches the
original code from before my constification. This may seem like a
very small win from a performance point of view, but if one remembers
that the variants of bn_cmp_words(), bn_add_words() and bn_sub_words()
are not at all optimized for the moment (and there's no corresponding
assembler code), and that their use may be just as non-optimal, I'm
pretty confident there are possibilities...
This code needs reviewing!
two functions that did expansion on in parameters (BN_mul() and
BN_sqr()). The problem was solved by making bn_dup_expand() which is
a mix of bn_expand2() and BN_dup().
BN_mod_mul_montgomery, which calls bn_sqr_recursive
without much preparation.
bn_sqr_recursive requires the length of its argument to be
a power of 2, which is not always the case here.
There's no reason for not using BN_sqr -- if a simpler
approach to squaring made sense, then why not change
BN_sqr? (Using BN_sqr should also speed up DH where g is chosen
such that it becomes small [e.g., 2] when converted
to Montgomery representation.)
Case closed :-)
Also, "make update" has added some missing functions to libeay.num,
updated the TABLE for the alpha changes, and updated thousands of
dependancies that have changed from recent commits.
because we're only handling words anyway) in BN_mod_exp_mont_word
making it a little faster for very small exponents,
and adjust the performance gain estimate in CHANGES according
to slightly more thorough measurements.
(15% faster than BN_mod_exp_mont for "large" base,
20% faster than BN_mod_exp_mont for small base.)
like Malloc, Realloc and especially Free conflict with already existing names
on some operating systems or other packages. That is reason enough to change
the names of the OpenSSL memory allocation macros to something that has a
better chance of being unique, like prepending them with OPENSSL_.
This change includes all the name changes needed throughout all C files.
Remove one ampersand so the compiler may complain less.
Make rand() static so it will not conflict with the C RTL.
Make bug() static too, for good measure.
Apparently BN_div_recp reports an error for small divisors
(1,2,4,8,40).
I haven't got mismatches so far. If you can, please run the test
program for a few days (nohup divtest >out& or something), and if it
reports a mismatch, post the output.
X is 5120 on 32-bit and 151552 on 64-bit architectures and I varies
from 0 to 4. As result the test was *unreasonably* slow and virtually
impossible to complete on 64-bit architectures (e.g. IRIX bc couldn't
even swallow such long lines).
crypto/bn/bn_lcl.h for further details. It should be noted that for
the moment of this writing the code was tested only on Alpha. If
compiled with DEC C the C implementation exhibits 12% performance
improvement over the crypto/bn/asm/alpha.s (on EV56 box running
AlphaLinux). GNU C is (unfortunately) 8% behind the assembler
implementation. But it's OpenVMS Alpha users who *may* benefit most
as 'apps/openssl speed rsa' exhibits 6 (six) times performance
improvement over the original VMS bignum implementation. Where "*may*"
means "as soon as code is enabled though #define SIXTY_FOUR_BIT and
crypto/bn/asm/vms.mar is skipped."
because otherwise BN_rand will fail unless DEVRANDOM works,
which causes the programs to dump core because they
don't check the return value of BN_rand (and if they
did, we still couldn't test anything).
- add comment to some files that appear not to be used at all.
returns int (1 = ok, 0 = not seeded). New function RAND_add() is the
same as RAND_seed() but takes an estimate of the entropy as an additional
argument.
problem was that one of the replacement routines had not been working since
SSLeay releases. For now the offending routine has been replaced with
non-optimised assembler. Even so, this now gives around 95% performance
improvement for 1024 bit RSA signs.
the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.
CPU BN_div bn_div_words overall comment
------------------------------------------------------------------------
PII +16% accumulated by +2-3% PII multiplies damn fast! Taking
inlining multiplication out of the loop
didn't make too much difference.
Eliminating of the multiplication
involved in remainder calculation
is the major factor.
Pentium +45% accumulated by +7-9% mull isn't that fast and replacing
inlining multiplications with additions in
the loop has more visible effect:-)
MIPS +75% +12% +20-25% In addition to the taking mults
R10000 out of the loop (giving 12% in the
asm/mips3.s) three mults were
eliminated in BN_div.
Alpha +30% +50% +10-15% Same as above. But remember that
EV4 bn_div_words is a C implementation.
It takes 4 Alpha mults in C to do
the same thing as 1 MIPS mult in
assembler does. So the effect (50%)
is more impressive. But not the
overall one... Well, if Alpha
bn_mul_add would be implemented
in assembler overall improvement
would be closer to MIPS...
in cryptlib.h (which is often included as "../cryptlib.h"), then the
question remains relative to which directory this is to be interpreted.
gcc went one further directory up, as intended; but makedepend thinks
differently, and so probably do some C compilers. So the ../ must go away;
thus e_os.h goes back into include/openssl (but I now use
#include "openssl/e_os.h" instead of <openssl/e_os.h> to make the point) --
and we have another huge bunch of dependency changes. Argh.
There were problems with putting e_os.h just into the top directory,
because the test programs are compiled within test/ in the "standard"
case in in their original directories in the makefile.one case;
and in the latter symlinks may not be available.
used with negative char values, so I've added casts to unsigned char.
Maybe what really should be done is change all those arrays and
pointers to type unsigned char [] or unsigned char *, respectively;
but using plain char with those predicates is just wrong, so something
had to be done.
Submitted by:
Reviewed by:
PR:
to error code script: it can now find untranslatable function codes (usually
because the function is static and not defined in a header: occasionally because
of a typo...) and unreferenced function and reason codes. To see this try:
perl util/mkerr.pl -recurse -debug
Also fixed some typos in crypto/pkcs12 that this found :-)
Also tidy up some error calls that had to be all on one line: the old error
script couldn't find codes unless the call was all on one line.
script, translates function codes better and doesn't need the K&R function
prototypes to work (NB. the K&R prototypes can't be wiped just yet: they are
still needed by the DEF generator...). I also ran the script with the -rewrite
option to update all the header and source files.
when they cause the destination to expand.
To see how evil this is try this:
#include <pem.h>
main()
{
BIGNUM *bn = NULL;
int i;
bn = BN_new();
BN_hex2bn(&bn, "FFFFFFFF");
BN_add_word(bn, 1);
printf("Value %s\n", BN_bn2hex(bn));
}
This would typically fail before the patch.
It also screws up if you comment out the BN_hex2bn line above or in any
situation where BN_add_word() causes the number of BN_ULONGs in the result
to change (try doubling the number of FFs).
consistent in the source tree and replaced `/bin/rm' by `rm'. Additonally
cleaned up the `make links' target: Remove unnecessary semicolons, subsequent
redundant removes, inline point.sh into mklink.sh to speed processing and no
longer clutter the display with confusing stuff. Instead only the actually
done links are displayed.
1. merge various obsolete readme texts into doc/ssleay.txt
where we collect the old documents and readme texts.
2. remove the first part of files where I'm already sure that we no longer need
them because of three reasons: either they are just temporary files which
were left by Eric or they are preserved original files where I've verified
that the diff is also available in the CVS via "cvs diff -rSSLeay_0_8_1b"
or they were renamed (as it was definitely the case for the crypto/md/
stuff).
We've still a horrible mess under crypto/bn/asm/. There for a lot of files
I'm sure whether we need them or not. So, when someone knows it better, feel
free to cleanup there.
1. Add *lots* of missing prototypes for static ssl functions.
2. VC++ doesn't understand the 'LL' suffix for 64 bits constants: change bn.org
3. Add a few missing prototypes in pem.org
Fix mk1mf.pl so it outputs a Makefile that doesn't choke Win95.
Fix mkdef.pl so it doesn't truncate longer names.
but the BN code had some problems that would cause failures when
doing certificate verification and some other functions.
Submitted by: Eric A Young from a C2Net version of SSLeay
Reviewed by: Mark J Cox
PR:
1. The already released version was 0.9.1c and not 0.9.1b
2. The next release should be 0.9.2 and not 0.9.1d, because
first the changes are already too large, second we should avoid any more
0.9.1x confusions and third, the Apache version semantics of
VERSION.REVISION.PATCHLEVEL for the version string is reasonable (and here
.2 is already just a patchlevel and not major change).
tVS: ----------------------------------------------------------------------