Notice that 'CBC-EDE' mode (which we all agree is in principle faster
if you're using 3 DES chips in parallel) is the slowest with these
particular sofware implementations.
These results are *very* strange. They are opposite to the results which I
get with the Stratus DES code (which has a CBC function optimized to
operate over a block of inputs).
Give me a break! They are not strange at all. In software, the
performance of well tuned implementations of the two algorithms is
going to be nearly identical. If you use software tuned for one
algorithm to implement the other, it's going to be slower.
The same will be true with hardware, only more so. I have some
hardware that will do the three loop case 10-100 times faster than the
one loop case. It doesn't matter. It would be easy to build hardware
that went the other way. It seems fairly unlikely that hardware
acceleration is going to be crucially important for PEM; and even if
it were I'd be hard pressed to guess which sort of hardware is more
likely to become prevalent.
This very much reminds me of the story of a donkey who starved to
death because he was standing exactly mid-way between two bales of hay
and therefore couldn't decide which one to eat from. I would note that
the story could not be true because donkeys are not that stupid.
Are we?
--Charlie
(kaufman(_at_)zk3(_dot_)dec(_dot_)com)