1. Original submission required minor modification to RC4_set_key, which
we don't want to tolerate and therefore we fix assembler instead.
2. Eliminate remaining byte-order dependence [look for RC4_BIG_ENDIAN].
3. Eliminate logical error [when key->x is referred prior key is verified].
4. HP-UX assembler puked on MODSCHED_RC4 macro with "syntax error,"
macro has to be splitted in two.
5. Deploy parallel compare in function prologue.
6. Eliminate redundant instuctions and nops.
7. Eliminate assembler warnings.
room for aligning of the key schedule itself [specific alignment is
required for future performance improvements], but OpenSSH "abuses"
our API by making copies and restoring RC4_KEY, thus ruining the
alignment and making it impossible to recover the key schedule.
PR: 1114
certain mix of calls to RC4 routine not covered by rc4test.c.
It's fixed now. In addition this patch inadvertently fixes minor
performance problem: in 0.9.7 context P4 was performing 12% slower
than the original implementation...
apparently impossible to compose blended code with would perform
satisfactory on all x86 and x86_64 cores, an extra RC4_CHAR
code-path is introduced and P4 core is detected at run-time. This
way we keep original performance on non-P4 implementations and
turbo-charge P4 performance by factor of 2.8x (on 32-bit core).