/*
* This release balances code size and performance. In particular key
* schedule setup is fully unrolled, because doing so *significantly*
* reduces amount of instructions per setup round and code increase is
* justifiable. In block functions on the other hand only inner loops
* are unrolled, as full unroll gives only nominal performance boost,
* while code size grows 4 or 7 times. Also, unlike previous versions
* this one "encourages" compiler to keep intermediate variables in
* registers, which should give better "all round" results, in other
* words reasonable performance even with not so modern compilers.
*/