pem-dev
[Top] [All Lists]

Re: DES wonk's delite

1993-05-30 12:23:00
Date:  Fri, 28 May 93 13:18 EDT
From: TCJones(_at_)DOCKMASTER(_dot_)NCSC(_dot_)MIL
Subject:  DES wonk's delite
Message-Id:  <930528171817(_dot_)993820(_at_)DOCKMASTER(_dot_)NCSC(_dot_)MIL>

I would not be so quick to cede the speed field to three-pass CBC as to
EDE2 with chaining.  Most of the time used in high speed cryptographic
processes using moderate to high speed DES chips is consumed by I/O.
That is, getting data in and out of the DES chips is the most
significate time consumer in a hardware environment.  Given that, the
three-pass CBC would be substantially slower than EDE2 (or EDE3 with the
CEI chip) even if the chaining needed to be performed externally.  Dont
forget that chaining is just an X-or operation that most CPU's can do
with great facility.

Sorry, but you're mistaking total latency for throughput by ignoring the
pipelining of the (DES-CBC)**3 case.  [The 3 DES chips, in this case, are
operating simultaneously on 3 different 8-byte groups of input.  They can
not do so in the (DES**3)-CBC case.]

Assume it takes x nsec to get on and off chip (either pure DES, EDE2 or XOR),
y nsec to do the single DES, z nsec to do EDE2 and w nsec to do XOR and v nsec
to do DES-CBC on-chip.

Scheme          Total Latency           Throughput

EDE#-CBC        2x+z+w                  1/(2x+z+w)
(DES**3)-CBC    4x+3y+w                 1/(4x+3y+w)
(DES-CBC)**3    3x+3v                   1/(x+v)

If x dominates these times, then clearly (DES-CBC)**3 wins the throughput race.
[I'd expect z to be very close to 3y and y to be nearly equal to v.]

The same equations apply to on-chip implementations (by setting x = 0).
The same logic applies to purely S/W implementations (slightly different
equations, of course).

 - Carl

<Prev in Thread] Current Thread [Next in Thread>