Commit graph

192 commits

Author SHA1 Message Date
Andy Polyakov
70b76d392f ppccap.c: fix compiler warning and perform sanity check outside signal masking.
ppc64-mont.pl: clarify comment and fix spelling.
2009-12-29 11:18:16 +00:00
Andy Polyakov
3fc2efd241 PA-RISC assembler: missing symbol and typos. 2009-12-28 16:13:35 +00:00
Andy Polyakov
cb3b9b1323 Throw in more PA-RISC assembler. 2009-12-27 20:49:40 +00:00
Andy Polyakov
d741cf2267 ppccap.c: tidy up.
ppc64-mont.pl: missing predicate in commentary.
2009-12-27 11:25:24 +00:00
Andy Polyakov
b4b48a107c ppc64-mont.pl: adapt for 32-bit and engage for all builds. 2009-12-26 21:30:13 +00:00
Andy Polyakov
0f529cbdc3 s390x-mont.pl: optimize prologue. 2009-02-10 08:46:48 +00:00
Andy Polyakov
8626230a02 s390x assembler pack update. 2009-02-09 15:42:04 +00:00
Dr. Stephen Henson
41b7619596 Fix missing prototype warnings then fix different prototype warnings ;-) 2009-01-11 16:17:26 +00:00
Andy Polyakov
be01f79d3d x86_64 assembler pack: add support for Win64 SEH. 2008-12-19 11:17:29 +00:00
Andy Polyakov
93c4ba07d7 x86_64-xlate.pl update, engage x86_64 assembler in mingw64. 2008-11-14 16:40:37 +00:00
Andy Polyakov
aa8f38e49b x86_64 assembler pack to comply with updated styling x86_64-xlate.pl rules. 2008-11-12 08:15:52 +00:00
Geoff Thorpe
6343829a39 Revert the size_t modifications from HEAD that had led to more
knock-on work than expected - they've been extracted into a patch
series that can be completed elsewhere, or in a different branch,
before merging back to HEAD.
2008-11-12 03:58:08 +00:00
Dr. Stephen Henson
9619b730b4 Use stddef.h to pick up size_t def. 2008-11-02 16:56:13 +00:00
Dr. Stephen Henson
c76fd290be Fix warnings about mismatched prototypes, undefined size_t and value computed
not used.
2008-11-02 12:50:48 +00:00
Ben Laurie
4d6e1e4f29 size_tification. 2008-11-01 14:37:00 +00:00
Andy Polyakov
492279f6f3 AIX build updates. 2008-09-12 14:45:54 +00:00
Andy Polyakov
61b05a0025 Make x86_64-mont.pl work with debug Win64 build. 2008-02-27 20:09:28 +00:00
Andy Polyakov
089458b096 ppc64-mont optimization. 2008-02-05 13:10:14 +00:00
Andy Polyakov
addd641f3a Unify ppc assembler make rules. 2008-01-13 22:01:30 +00:00
Dr. Stephen Henson
4d1f3f7a6c Update perl asm scripts include paths for perlasm. 2008-01-05 22:28:38 +00:00
Andy Polyakov
c8ec4a1b0b Final (for this commit series) optimized version and with commentary section. 2007-12-29 20:30:09 +00:00
Andy Polyakov
699e1a3a82 This is also informational commit exposing loop modulo scheduling "factor." 2007-12-29 20:28:01 +00:00
Andy Polyakov
64214a2183 New Montgomery multiplication module, ppc64-mont.pl. Reference, non-optimized
implementation. This is essentially informational commit.
2007-12-29 20:26:46 +00:00
Andy Polyakov
7722e53f12 Yet another ARM update. It appears to be more appropriate to make
developers responsible for -march choice.
2007-09-27 16:27:03 +00:00
Andy Polyakov
673c55a2fe Latest bn_mont.c modification broke ECDSA test. I've got math wrong, which
is fixed now.
2007-06-29 13:10:19 +00:00
Andy Polyakov
5b89f78a89 Typo in x86_64-mont.pl.
PR: 1549
2007-06-21 11:38:52 +00:00
Andy Polyakov
1c7f8707fd bn_asm for s390x. 2007-06-20 14:10:16 +00:00
Andy Polyakov
2329694222 SPARC Solaris and Linux assemblers treat .align directive differently.
PR: 1547
2007-06-20 12:24:22 +00:00
Andy Polyakov
7d9cf7c0bb Eliminate conditional final subtraction in Montgomery assembler modules. 2007-06-17 17:10:03 +00:00
Andy Polyakov
a2a54ffc5f s390x assembler pack. 2007-04-30 08:42:54 +00:00
Andy Polyakov
8b71d35458 nasm fixes. 2007-03-20 08:55:58 +00:00
Andy Polyakov
760e353528 sparcv9a-mont was modified to handle 32-bit aligned input, but check
for 64-bit alignment was not removed.
2007-03-20 08:54:51 +00:00
Andy Polyakov
64aecc6720 Make armv4t-mont module backward binary compatible with armv4 and rename it
accordingly.
2007-01-17 20:12:41 +00:00
Andy Polyakov
43b8fe1cd0 Montgomery multiplication for ARMv4. 2007-01-11 21:43:25 +00:00
Andy Polyakov
8876e58f34 Montgomery multiplication for MIPS III/IV. Not engaged. 2006-12-29 11:09:33 +00:00
Andy Polyakov
7321a84d4c Minor clean-up in crypto/bn/asm. 2006-12-29 11:05:20 +00:00
Andy Polyakov
4cfe3df1f5 Minor performance improvements to x86-mont.pl. 2006-12-28 12:43:16 +00:00
Andy Polyakov
8f2d60ec26 Fix for "strange errors" exposed by ccgost engine. The fix is
two extra insructions in sqradd loop at line #503.
2006-12-27 10:59:51 +00:00
Andy Polyakov
1702c8c4bf x86-mont.pl sse2 tune-up and integer-only squaring procedure. 2006-12-22 15:28:07 +00:00
Andy Polyakov
87d3af6475 Eliminate 64-bit alignment limitation in sparcv9a-mont. 2006-12-08 15:18:41 +00:00
Andy Polyakov
98939a05b6 alpha-mont.pl: gcc portability fix and make-rule. 2006-12-08 14:18:58 +00:00
Andy Polyakov
d28134b8f3 Minor, +10%, tune-up for x86_64-mont.pl. 2006-12-08 10:13:51 +00:00
Andy Polyakov
8583eba015 Montgomery multiplication routine for Alpha. 2006-12-08 10:12:56 +00:00
Andy Polyakov
73b979e601 Clarify HAL SPARC64 support situation in sparcv9a-mont.pl. 2006-11-28 11:07:36 +00:00
Andy Polyakov
ebae8092cb Minor optimizations based on intruction level profiler feedback. 2006-11-28 10:34:51 +00:00
Andy Polyakov
2e21922eb6 Modulo-schedule loops in sparcv9a-mont.pl. Overall improvement factor
over 0.9.8 is up to 3x on USI&II cores and up to 80% - on USIII&IV.
2006-11-28 07:24:26 +00:00
Andy Polyakov
1c3d2b94be This is "informational" commit. Its mere purpose is to expose "modulo
factor" in inner loops.
2006-11-28 07:20:36 +00:00
Andy Polyakov
48d2335d73 Non-SSE2 path to bn_mul_mont. But it's disabled, because it currently
doesn't give performance improvement.
2006-11-27 14:59:35 +00:00
Andy Polyakov
31439046e0 bn/asm/ppc.pl to use ppc-xlate.pl. 2006-10-17 14:37:07 +00:00
Andy Polyakov
cecfdbf72d VIA-specific Montgomery multiplication routine. 2006-10-17 07:04:48 +00:00
Andy Polyakov
8ea975d070 +20% tune-up for Power5. 2006-08-09 15:40:30 +00:00
Andy Polyakov
c8a0d0aaf9 Engage assembler in solaris64-x86_64-cc. 2006-07-31 22:28:40 +00:00
Andy Polyakov
67d990904e Futher minor PPC assembler update. 2006-05-04 21:30:41 +00:00
Andy Polyakov
c09a0318b7 Minor PPC assembler updates. 2006-05-03 14:07:34 +00:00
Andy Polyakov
2c5d4daac5 Yet another "teaser" Montgomery multiplication module, for PowerPC. 2006-04-30 21:15:29 +00:00
Andy Polyakov
7a5dbeb782 Minor sparcv9 clean-ups. 2005-12-27 21:27:39 +00:00
Andy Polyakov
3b4a0225e2 As SPARCV9 CPU flavor is [expected to be] detected at run-time, we can
afford to relax SPARCV9/8+ compiler command line and produce "unversal"
binaries as we used to.
2005-12-19 09:10:06 +00:00
Andy Polyakov
a00e414faf Unify sparcv9 assembler naming and build rules among 32- and 64-bit builds.
Engage run-time switch between bn_mul_mont_fpu and bn_mul_mont_int.
2005-12-16 17:39:57 +00:00
Andy Polyakov
68ea60683a Add IALU-only bn_mul_mont for SPARCv9. See commentary section for details. 2005-12-15 22:43:33 +00:00
Andy Polyakov
6df8c74d5b Switch 64-bit sparcv9 platforms from bn(64,64) to bn(64,32). This doesn't
have impact on performance, because amount of multiplications does not
increase with this switch, not on sparcv9 that is. On the contrary, it
actually improves performance, because it spares a load of instructions
used to chase carries. Not to mention that BN assembler modules can be
shared more freely between 32- and 64-bit builts.
2005-12-15 22:40:58 +00:00
Andy Polyakov
07645deeb8 Apply "better safe than sorry" approach after addressing sporadic SEGV in
bn_sub_words to the rest of the sparcv8plus.S.
2005-11-15 08:02:10 +00:00
Andy Polyakov
c52c82ffc1 Attempt to resolve sporadic SEGV crashes in bn_sub_words in OpenSSH. I'm
baffled why it crashes and does it sporadically...
2005-11-11 20:07:07 +00:00
Andy Polyakov
a4d729f31d Clarify binary compatibility with HAL/Fujitsu SPARC64 family. 2005-10-25 15:39:47 +00:00
Andy Polyakov
aa2be094ae Add support for 32-bit ABI to sparcv9a-mont.pl module. 2005-10-22 18:16:09 +00:00
Andy Polyakov
4d524040bc Change bn_mul_mont declaration and BN_MONT_CTX. Update CHANGES. 2005-10-22 17:57:18 +00:00
Andy Polyakov
bcb43bb358 Yet another "teaser" Montgomery multiply module, for UltraSPARC. It's not
integrated yet, but it's tested and benchmarked [see commentary section
for further details].
2005-10-19 07:12:06 +00:00
Andy Polyakov
34736de4c0 Flip saved argument block and tp [required for non-SSE2 path]. 2005-10-14 16:05:21 +00:00
Andy Polyakov
5f50d597f2 Make sure x86-mont.pl returns zero even if compiled with no-sse2. 2005-10-14 15:24:06 +00:00
Andy Polyakov
35593b33f4 Add timestamp to x86-mont.pl. 2005-10-09 10:26:56 +00:00
Andy Polyakov
54f3d200d3 Throw in bn/asm/x86-mont.pl Montgomery multiplication "teaser". 2005-10-09 09:53:58 +00:00
Andy Polyakov
7a2f4cbfe8 x86_64-mont.pl readability improvement. 2005-10-07 15:18:16 +00:00
Andy Polyakov
5ac7bde7c9 Throw in Montgomery multiplication assembler for x86_64. 2005-10-07 14:18:06 +00:00
Andy Polyakov
6f9afa68cd IA-32 BN tune-up. Performance imrpovement varies with platform and
keylength, this time larger improvement for shorter keys, and reaches
15%. Both SSE2 and IALU code pathes are improved.
2005-09-20 12:26:54 +00:00
Andy Polyakov
7cfe2a5e65 Fix Intel assembler warnings. 2005-08-10 08:28:36 +00:00
Andy Polyakov
ef428d5681 Fix unwind directives in IA-64 assembler modules. This helps symbolic
debugging and doesn't affect functionality.

Submitted by: David Mosberger

Obtained from: http://www.hpl.hp.com/research/linux/crypto/
2005-07-18 09:54:14 +00:00
Andy Polyakov
31efffbdba Trap condition should be 64-bit when it's due. 2005-07-03 09:17:50 +00:00
Andy Polyakov
aaa5dc614f More elegant solution to "sparse decimal printout on PPC" problem. 2005-07-02 08:58:55 +00:00
Andy Polyakov
8be97c01d1 Decimal printout of a BN is wrong on PPC, it's sparse with very few
significant digits. As soon it verifies elsewhere it goes to 0.9.8 and
0.9.7.
2005-07-01 17:49:47 +00:00
Richard Levitte
4bb61becbb Add emacs cache files to .cvsignore. 2005-04-11 14:17:07 +00:00
Andy Polyakov
60fd574cdf Make bn/asm/x86_64-gcc.c gcc4 savvy. +r is likely to be initially
introduced for a reason [like bug in initial gcc port], but proposed
=&r is treated correctly by senior 3.2, so we can assume it's safe now.
PR: 1031
2005-04-03 18:53:29 +00:00
Andy Polyakov
da30c74a27 Remove unused assembler modules. 2005-02-06 13:43:02 +00:00
Andy Polyakov
2b247cf81f OPENSSL_ia32cap final touches. Note that OPENSSL_ia32cap is no longer a
symbol, but a macro expanded as (*(OPENSSL_ia32cap_loc())). The latter
is the only one to be exported to application.
2004-08-29 16:36:05 +00:00
Andy Polyakov
859ceeeb51 Anchor AES and SHA-256/-512 assembler from C. 2004-07-18 17:26:01 +00:00
Andy Polyakov
e2f2a9af2c New scalable bn_mul_add_words loop, which provides up to >20% overall
performance improvement. Make module more gcc friendly and clarify
copyright issues for division routine.
2004-07-01 11:10:38 +00:00
Andy Polyakov
1809e858bb Eliminate compiler warnings and throw in performance table. 2004-05-28 10:15:58 +00:00
Andy Polyakov
d3adc3d3ed SSE2 accelerated bn_mul_add_words. Code is currently disabled till proper
config and run-time support is added.
PR: 788
Submitted by: <dean@arctic.org>
Reviewed by: <appro>

Obtained from: http://arctic.org/~dean/crypto/rsa.html
2004-05-06 10:36:49 +00:00
Andy Polyakov
dd55880644 Improved PowerPC support. Proper ./config support for ppc targets,
especially for AIX. But most important BIGNUM assembler implementation
submitted by IBM.

Submitted by: Peter Waltenberg <pwalten@au1.ibm.com>
Reviewed by: appro
2004-04-27 22:05:50 +00:00
Andy Polyakov
1751034669 Typo in crypto/bn/asm/x86_64.c, bn_div_words().
PR: 821
2004-02-07 09:51:28 +00:00
Andy Polyakov
722d17cbac This is an *initial* tune-up. This update puts Itanium2 back on par with
Itanium. I mean if overall performance improvement over C version was X
for Itanium, it's X even for Itanium2.
2003-01-19 21:29:59 +00:00
Richard Levitte
2f09524501 A few more files to ignore 2003-01-16 21:32:56 +00:00
Andy Polyakov
46a0d4fbcb Support for ILP32 on HPUX-IA64. 2003-01-03 15:10:46 +00:00
Andy Polyakov
04945fda66 pa-risc2.s was not PIC, see RT#426. I strip call to fprintf as it's
never called anyway (it's a debugging assertion). If pa-risc2W.s is
PIC remains to be seen...
2003-01-03 10:52:40 +00:00
Richard Levitte
e9883d285d Finally, a bn_div_words() in VAX assembler that goes through all tests.
PR: 413
2002-12-23 11:25:51 +00:00
Richard Levitte
9b58214e4a More accurate comments. 2002-12-20 16:38:36 +00:00
Andy Polyakov
2f98abbcb6 x86_64 performance patch. 2002-12-14 20:42:05 +00:00
Richard Levitte
6ab285bf4c I think I got it now. Apparently, the case of having to shift down
the divisor was a bit more complex than I first saw.  The lost bit
can't just be discarded, as there are cases where it is important.
For example, look at dividing 320000 with 80000 vs. 80001 (all
decimals), the difference is crucial.  The trick here is to check if
that lost bit was 1, and in that case, do the following:

1. subtract the quotient from the remainder
2. as long as the remainder is negative, add the divisor (the whole
   divisor, not the shofted down copy) to it, and decrease the
   quotient by one.

There's probably a nice mathematical proof for this already, but I
won't bother with that, unless someone requests it from me.
2002-12-02 21:31:45 +00:00
Richard Levitte
1d3159bcca Make some names consistent. 2002-12-02 02:40:27 +00:00
Richard Levitte
f60ceb54eb Through some experimentation and thinking, I think I finally got the
proper implementation of bn_div_words() for VAX.

If the tests go through well, the next step will be to test on Alpha.
2002-12-02 02:28:27 +00:00
Richard Levitte
0f995b2f40 Small bugfix: even when r == d, we need to adjust r and q.
PR: 366
2002-12-01 02:17:23 +00:00
Richard Levitte
a678430602 Redo the VAX assembler version of bn_div_words().
PR: 366
2002-12-01 00:49:36 +00:00