apparently impossible to compose blended code with would perform
satisfactory on all x86 and x86_64 cores, an extra RC4_CHAR
code-path is introduced and P4 core is detected at run-time. This
way we keep original performance on non-P4 implementations and
turbo-charge P4 performance by factor of 2.8x (on 32-bit core).
is to have a placeholder to small routines, which can be written only
in assembler. In IA-32 case this includes processor capability
identification and access to Time-Stamp Counter. As discussed earlier
OPENSSL_ia32cap is introduced to control recently added SSE2 code
pathes (see docs/crypto/OPENSSL_ia32cap.pod). For the moment the
code is operational on ELF platforms only. I haven't checked it yet,
but I have all reasons to believe that Windows build should fail to
link too. I'll be looking into it shortly...