The fastest hardware you can do (all on one VLSI
chip) will show the 3 loop case to be 3x faster than the 1 loop case.
There is no way to design it the other way without artificially slowing
down the individual DES operations in the 3-loop case, beyond what each
does in the 1-loop case.
This is simply not so -- Since we no longer consider existing hardware
(ie. hardware folk will leap to implement PEM) lets look at the basics:
|
v
DES block
|
X-or <--+-----------------+
| | |
v | |
Encrypt block | #1 |
| | |
+------+ |
| |
X-or <--+ |
| | |
v | |
Encrypt block | #2 |
| | |
+------+ |
| |
X-or <--+ |
| | |
v | |
Encrypt block | #3 |
| | |
+------+-ALTERNATE PATH -+
|
v
Now we can eliminate the alternate path for EDE-CBC or we can
eliminate the interior feedback chains for CBC_EDE. It seems that
marginally less hardware is required for EDE-CBC than for CBC-EDE,
therefore EDE-CBC is faster.
You're forgetting that in hardware, each of the three encrypt blocks
can and would operate in parallel -- but can do that *only* if there
are three feedback loops. If you have only one feedback loop, then
only one 64-bit block of data can get into the three encrypt-blocks
and therefore, they can not operate in parallel.
Therefore, the three-loop case operates 3x faster than the 1-loop case.
Thank you for drawing the picture for me.
- Carl