Notice that 'CBC-EDE' mode (which we all agree is in principle faster
if you're using 3 DES chips in parallel) is the slowest with these
particular software implementations.
I don't believe that we all agree, we just have one participant who makes
a lot of noise about 3 loop CBC. I have never heard anyone else support it.
The fastest hardware you can do (all on one VLSI
chip) will show the 3 loop case to be 3x faster than the 1 loop case.
There is no way to design it the other way without artificially slowing
down the individual DES operations in the 3-loop case, beyond what each
does in the 1-loop case.
This is simply not so -- Since we no longer consider existing hardware
(ie. hardware folk will leap to implement PEM) lets look at the basics:
|
v
DES block
|
X-or <--+-----------------+
| | |
v | |
Encrypt block | #1 |
| | |
+------+ |
| |
X-or <--+ |
| | |
v | |
Encrypt block | #2 |
| | |
+------+ |
| |
X-or <--+ |
| | |
v | |
Encrypt block | #3 |
| | |
+------+-ALTERNATE PATH -+
|
v
Now we can eliminate the alternate path for EDE-CBC or we can
eliminate the interior feedback chains for CBC_EDE. It seems that
marginally less hardware is required for EDE-CBC than for CBC-EDE,
therefore EDE-CBC is faster. The problem is all in the assumptions.
I can design hardware so either is faster as well as software can.
(eg. since x-or is slower than Encrypt I can pipeline the x-or and
the two methods are equivalent.) Also, I can design hardware to activate
or inactivate any of the steps or put any key in any block.
The major point to make is that PEM is not yet real and banking is real,
so I will design hardware to implement EDE in codebook for the bankers,
and if I think it would not cost too much, I might include a CBC loop
in hardware around the whole thing. Even if PEM did chose CBC-EDE
I will still need to implement straight codebook EDE. Therefore, if
you wish to encourage any hardware developer to implement your algorithms,
you must either:
1> give him a VERY big market, or
2> make the incremental implementation cost small.
Since no one will guarantee 1>, then I suggest you standardize on EDE-CBC
now and just maybe 2> will create a fast hardware implementation.
Peace ..Tom Jones