Date: Fri, 28 May 93 13:18 EDT
From: TCJones(_at_)DOCKMASTER(_dot_)NCSC(_dot_)MIL
Subject: DES wonk's delite
Message-Id: <930528171817(_dot_)993820(_at_)DOCKMASTER(_dot_)NCSC(_dot_)MIL>
I would not be so quick to cede the speed field to three-pass CBC as to
EDE2 with chaining. Most of the time used in high speed cryptographic
processes using moderate to high speed DES chips is consumed by I/O.
That is, getting data in and out of the DES chips is the most
significate time consumer in a hardware environment. Given that, the
three-pass CBC would be substantially slower than EDE2 (or EDE3 with the
CEI chip) even if the chaining needed to be performed externally. Dont
forget that chaining is just an X-or operation that most CPU's can do
with great facility.
Sorry, but you're mistaking total latency for throughput by ignoring the
pipelining of the (DES-CBC)**3 case. [The 3 DES chips, in this case, are
operating simultaneously on 3 different 8-byte groups of input. They can
not do so in the (DES**3)-CBC case.]
Assume it takes x nsec to get on and off chip (either pure DES, EDE2 or XOR),
y nsec to do the single DES, z nsec to do EDE2 and w nsec to do XOR and v nsec
to do DES-CBC on-chip.
Scheme Total Latency Throughput
EDE#-CBC 2x+z+w 1/(2x+z+w)
(DES**3)-CBC 4x+3y+w 1/(4x+3y+w)
(DES-CBC)**3 3x+3v 1/(x+v)
If x dominates these times, then clearly (DES-CBC)**3 wins the throughput race.
[I'd expect z to be very close to 3y and y to be nearly equal to v.]
The same equations apply to on-chip implementations (by setting x = 0).
The same logic applies to purely S/W implementations (slightly different
equations, of course).
- Carl