In an earlier note I discussed the overall structure of DES calls
for CBC and EDE. In this note I will add detail to my even earlier
comments on the notion that I/O considerations outweigh those of
the DES algorithm itself.
The DES algorithm, when implemented most efficiently, takes 18 cycles
to complete, 16 rounds plus the permutations at beginning and end. If
CBC is required, 1 or 2 additional cycles are required depending on
the number of 64 bit registers in the design. In a typical chip with
a single 8 bit interface (eg. the WD20C03), we need 8 cycles for
input and 8 for output plus approximately 5 cycles for register propa-
gation. So we can see that for this type of implementation, I/O
requires longer than the DES algorithm itself. It is possible to
improve throughput by adding busses, but this would complicate a
design that needed to support multiple modes and probably add register
propagation delays as well. If the DES operation were run, as is, for
3 iteration, the most that the chip would be slowed down is 50% although
cycles could be saved by eliminating the useless interior permutations.
In any case, with a 20 megahertz part, single DES would require about
2 microseconds and triple DES about 4, both with one complete I/O cycle.
For 64 bits paths this is 4 and 2 megabytes/second respectively.
I/O on most workstations (which is where most email agents are presumed
to reside) will run from 8 megahertz up, with widths of 8 or 16 bits. But
a fast I/O bus like SCSI will only operate at 4 megabytes/sec on a good
day, and both are half-duplex protocols. This means that 1.5 megabytes/
second total throughput would be a high side performance goal for a
typical workstation. Better results would be obtained from putting the
DES chip on the motherboard, but that is not a likely scenario. Higher
speed I/O is planned for display or ram, but the number of slots for that
is likely to be limited.
The result is an argument about a 4 megabyte versus 2 megabyte DES chip on
a workstation that cannot support more that 1.5 megabytes total throughput.
Repeating my point from an earlier message, with the volume market from
banking for codebook EDE, general purpose DES boards are not likely to
support peculiar DES configurations without a large guaranteed market.
Add that to the fact that *no* DES chip manufacturer of 3 years ago still
makes DES chips, and I would encourage PEM to stick with the larger market
designs and use EDE with whatever chaining method is convenient.
- - -
I will also repeat an assertion I made on some other thread, that chaining
seems to be an inappropriate solution for PEM. Steve Kent observed that
chaining hides 8 byte cycle repetitions, but there are better methods to
do that, such as compression, which even gives both cryptographic and
transmission cost benefits that chaining cannot match. So if throughput
is really the issue, re-evaluate chaining rather than the existing EDE
algorithm.
I have heard others give different reasons for adding chaining, but these
all related to integrity, which can best be achieved by the MIC.
Peace ..Tom Jones