Andy Polyakov
6251989eb6
x86_64 assembly pack: make it possible to compile with Perl located on
...
path with spaces.
PR: 2835
2012-06-27 10:08:23 +00:00
Andy Polyakov
d2e1803197
x86[_64] assembly pack: update benchmark results.
2012-06-12 14:18:21 +00:00
Andy Polyakov
c608171d9c
Add RC4-MD5 and AESNI-SHA1 "stitched" implementations.
2011-08-23 20:51:38 +00:00
Andy Polyakov
a908b711ac
rc4-586.pl: add Atom performance results.
2011-06-28 12:36:10 +00:00
Andy Polyakov
0772f3b4f6
rc4-x86_64.pl: commentary update.
2011-06-27 09:46:16 +00:00
Andy Polyakov
fe9a5107be
Various mingw64 fixes.
2011-05-29 13:51:14 +00:00
Andy Polyakov
f44cb15fab
rc4-x86_64.pl: fix due credit.
2011-05-27 18:58:37 +00:00
Andy Polyakov
986289604e
rc4-x86_64.pl: RC4_options fix-up.
2011-05-27 16:15:12 +00:00
Andy Polyakov
4bb90087d7
x86[_64]cpuid.pl: harmonize usage of reserved bits #20 and #30 .
2011-05-27 15:32:43 +00:00
Andy Polyakov
0ca9a483af
rc4-x86_64.pl: major optimization for contemporary Intel CPUs.
2011-05-27 09:51:09 +00:00
Andy Polyakov
0dff8ba248
rc4-586.pl: optimize even further...
2011-05-27 09:46:19 +00:00
Andy Polyakov
6a99984b57
rc4-586.pl: optimize unused code path.
2011-05-25 09:36:13 +00:00
Andy Polyakov
760d2551fb
rc4-586.pl: 50% improvement on Core2 and 80% on Westmere.
2011-05-24 13:07:29 +00:00
Andy Polyakov
e822c756b6
s390x assembler pack: adapt for -m31 build, see commentary in Configure
...
for more details.
2010-11-29 20:52:43 +00:00
Andy Polyakov
6219d2c294
rc4-s390x.pl: harmonize build rule with other similar rules.
2010-07-26 21:56:16 +00:00
Andy Polyakov
629fd3aa91
rc4-x86_64.pl: "Westmere" optimization.
2010-05-13 21:01:24 +00:00
Andy Polyakov
3efe51a407
Revert previous Linux-specific/centric commit#19629. If it really has to
...
be done, it's definitely not the way to do it. So far answer to the
question was to ./config -Wa,--noexecstack (adopted by RedHat).
2010-05-05 22:05:39 +00:00
Ben Laurie
0e3ef596e5
Non-executable stack in asm.
2010-05-05 15:50:13 +00:00
Andy Polyakov
cb3b9b1323
Throw in more PA-RISC assembler.
2009-12-27 20:49:40 +00:00
Andy Polyakov
75d448dde4
Handle push/pop %rbx in epi/prologue (this is Win64 SEH thing).
2009-04-26 17:58:01 +00:00
Andy Polyakov
c558c99fd8
rc4-s390x.pl: allow for older assembler and optimize character loop.
2009-02-12 14:48:49 +00:00
Andy Polyakov
13c3a1defa
RC4 for s390x.
2009-02-11 10:01:36 +00:00
Andy Polyakov
be01f79d3d
x86_64 assembler pack: add support for Win64 SEH.
2008-12-19 11:17:29 +00:00
Andy Polyakov
aa8f38e49b
x86_64 assembler pack to comply with updated styling x86_64-xlate.pl rules.
2008-11-12 08:15:52 +00:00
Andy Polyakov
a078befcbe
rc4-x86_64 portability fix.
2008-01-12 11:29:45 +00:00
Andy Polyakov
544b82e493
Some assembler are allergic to lea reg,BYTE PTR[...].
...
Submitted by: Guenter Knauf
2007-12-02 21:32:03 +00:00
Andy Polyakov
20c04a13e6
Reimplement rc4-586.pl, relicense rc4-x86_64.pl.
2007-04-26 20:48:38 +00:00
Andy Polyakov
9babf3929b
RC4_set_key for x86_64 and Core2 optimization.
...
PR: 1447
2007-04-02 09:50:14 +00:00
Andy Polyakov
2875462425
Reserve for assembler implementation of RC4_set_key and implement x86 one.
2007-04-01 17:01:12 +00:00
Andy Polyakov
de50494505
Two extra instructions in RC4 character loop give 80% performance
...
improvement on Core2. I still need to detect Core2 and choose this
path...
2007-03-20 09:13:07 +00:00
Andy Polyakov
2802ec65c2
Pedantic polish to rc4-ia64.pl.
2005-07-20 11:47:47 +00:00
Andy Polyakov
26c07054a1
Retire original rc4-ia64.S.
2005-07-18 18:59:21 +00:00
Andy Polyakov
4ac210c16a
This update implements following improvements.
...
1. Original submission required minor modification to RC4_set_key, which
we don't want to tolerate and therefore we fix assembler instead.
2. Eliminate remaining byte-order dependence [look for RC4_BIG_ENDIAN].
3. Eliminate logical error [when key->x is referred prior key is verified].
4. HP-UX assembler puked on MODSCHED_RC4 macro with "syntax error,"
macro has to be splitted in two.
5. Deploy parallel compare in function prologue.
6. Eliminate redundant instuctions and nops.
7. Eliminate assembler warnings.
2005-07-18 17:11:13 +00:00
Andy Polyakov
02703c74a4
Unrolled RC4 IA-64 loop gives 40% improvement over current assembler
...
implementation [as predicted].
Submitted by: David Mosberger
Obtained from: http://www.hpl.hp.com/research/linux/crypto/
2005-07-18 16:55:52 +00:00
Andy Polyakov
ef428d5681
Fix unwind directives in IA-64 assembler modules. This helps symbolic
...
debugging and doesn't affect functionality.
Submitted by: David Mosberger
Obtained from: http://www.hpl.hp.com/research/linux/crypto/
2005-07-18 09:54:14 +00:00
Andy Polyakov
a4022932ee
Omit padding in RC4_KEY on IA-64. The idea behind padding was to reserve
...
room for aligning of the key schedule itself [specific alignment is
required for future performance improvements], but OpenSSH "abuses"
our API by making copies and restoring RC4_KEY, thus ruining the
alignment and making it impossible to recover the key schedule.
PR: 1114
2005-06-26 16:09:29 +00:00
Andy Polyakov
804515425a
+20% performance improvement of P4-specific RC4_CHAR loop.
2005-05-15 22:43:00 +00:00
Andy Polyakov
0ee883650d
Commentary update motivating code update in 0.9.7.
2005-05-04 14:51:38 +00:00
Andy Polyakov
647907918d
Commentary update.
2005-05-03 21:16:42 +00:00
Andy Polyakov
5f1841cdca
Rename amd64 modules to x86_64 and update RC4 implementation.
2005-05-03 15:42:05 +00:00
Andy Polyakov
1cfd258ed6
Throw in x86_64 AT&T to MASM assembler converter to facilitate development
...
of dual-ABI Unix/Win64 modules.
2005-04-17 21:05:57 +00:00
Richard Levitte
4bb61becbb
Add emacs cache files to .cvsignore.
2005-04-11 14:17:07 +00:00
Andy Polyakov
81ee80ab88
+45% RC4 performance boost on Intel EM64T core. Unrolled loop providing
...
further +35% will follow...
Submitted by: Zou Nanhai
2005-04-06 09:45:42 +00:00
Andy Polyakov
f774accdbf
Fix rc4-ia64.S to pass more exhaustive regression tests.
2004-12-02 10:07:55 +00:00
Andy Polyakov
7c69478064
I've introduced a bug to i386 RC4 assembler, which would emerge with
...
certain mix of calls to RC4 routine not covered by rc4test.c.
It's fixed now. In addition this patch inadvertently fixes minor
performance problem: in 0.9.7 context P4 was performing 12% slower
than the original implementation...
2004-12-01 15:28:18 +00:00
Andy Polyakov
b7b46c9a87
Add 0.9.7 specific comments to RC4 assembler modules.
2004-11-30 15:46:46 +00:00
Andy Polyakov
7a3240e319
Final touches to rc4/asm/rc4-596.pl, +52% better performance on AMD core.
2004-11-29 21:12:58 +00:00
Andy Polyakov
d675c74d14
RC4 IA-64 assembler implementation.
2004-11-26 15:07:50 +00:00
Andy Polyakov
376729e130
RC4 tune-up for Intel P4 core, both 32- and 64-bit ones. As it's
...
apparently impossible to compose blended code with would perform
satisfactory on all x86 and x86_64 cores, an extra RC4_CHAR
code-path is introduced and P4 core is detected at run-time. This
way we keep original performance on non-P4 implementations and
turbo-charge P4 performance by factor of 2.8x (on 32-bit core).
2004-11-21 10:36:25 +00:00
Andy Polyakov
68d9e764cb
As was shown by Marc Bevand reordering of couple of load operations
...
results in even higher performance gain of 3.3x:-) At least on
Opteron...
2004-11-09 17:23:26 +00:00