apparently impossible to compose blended code with would perform
satisfactory on all x86 and x86_64 cores, an extra RC4_CHAR
code-path is introduced and P4 core is detected at run-time. This
way we keep original performance on non-P4 implementations and
turbo-charge P4 performance by factor of 2.8x (on 32-bit core).
1. The already released version was 0.9.1c and not 0.9.1b
2. The next release should be 0.9.2 and not 0.9.1d, because
first the changes are already too large, second we should avoid any more
0.9.1x confusions and third, the Apache version semantics of
VERSION.REVISION.PATCHLEVEL for the version string is reasonable (and here
.2 is already just a patchlevel and not major change).
tVS: ----------------------------------------------------------------------