Subject: Re: Triple DES on PEM WG Agenda
Date: Fri, 18 Jun 93 23:01:17 +0100
From: Mike Roe <Michael(_dot_)Roe(_at_)cl(_dot_)cam(_dot_)ac(_dot_)uk>
Message-Id: <"swan.cl.cam.:197160:930618220128"@cl.cam.ac.uk>
Mike,
My personal preference would be for
a) k1 = k3
b) a single feedback loop.
it's clear that this is still a religious war -- a value system
conflict -- and therefore not something which can be solved by debate.
Number of bits of key is not a big issue to me, either. 112 is fine.
Performance of CBC is a big issue for me.
I have yet to see much convincing evidence that the single feedback loop
is going to be a performance problem on the vast majority of platforms.
This must be a value system conflict. To me, a full factor of 3 in
performance for future VLSI implementations is very convincing.
Did you read that message of mine giving the performance analysis, about a
month ago?
If you're doing DES in software, it doesn't make any difference.
This claim is false. Software implementations won't see the full factor of
3 that full-VLSI triple-DES will see, but there is a difference. For
example, Stratus DES software is optimized to do CBC over blocks of
plaintext -- saving procedure call and setup overhead which would otherwise
have to be repeated for each 64-bit word in the block. That overhead
doesn't dominate the execution time but it is significant enough that it is
well worth calling the CBC_block routine rather than doing multiple calls
to the single DES routine.
Also, most of the DES chips I've looked at will do triple-DES in chaining
mode.
Hold it. We may be talking about two different things here.
You might be describing #1 when I thought people were describing #2 (a chip
I've never seen advertised, much less in the flesh).
1. a chip which reads in 64 bits of data;
holds 3 key schedules;
has 1 set of DES logic;
uses that DES logic three times on the data, once with each key;
presents the output 64 bits of data
or
2. a chip which reads in 64 bits of data;
has 3 sets of DES logic internally, each with its own key schedule;
passes the internal data to the first set of DES logic and
gets a second input datum while passing data from 1st DES to 2nd
and 2nd to 3rd;
presents data which has finished the 3rd stage for output.
It is the latter which I was talking about and which I assumed people had
meant when they talked about a chip doing triple-DES internally.
The former can not achieve the speed of 3 DES chips in sequence. The
latter can.
The former could implement CBC as 3 loops, just as the latter would have
to in order to preserve performance.
The former wouldn't get any performance advantage out of 3-loop CBC so
the temptation would be to do only 1 loop -- but that algorithm choice
limits future implementations to the low performance which the former
chip suffers because it used only a single set of DES logic. The reasoning
is circular: current implementation doesn't gain performance from 3 loops
and 3 loops costs more in silicon, so don't do it. You then propose doing
the pipelined implementation in silicon and someone points out that the
single CBC loop prevents pipelining, so no one ever builds the pipelined
chip.
This kind of "it won't hurt us in our current implementation" thinking has
bugged me for years in performance analysis. You end up with lots of
decisions which didn't matter in old technology but which, for
compatibility reasons, clobber the performance of new technology.
The appeal I'm making to the PEM-DEV community is not to ignore performance
when defining triple-DES-CBC -- or not to try defining triple-DES-CBC.
What I'm saying is that as far as I know, *no one* has defined CBC mode for
triple-DES, so if PEM tries to do it, that will be the first definition --
and it needs to be right. That decision will live for as long as PEM
lives, (a long time, we hope :-).
This would have too great a future impact to be done wrong -- and a single
feedback loop around the three DESs is *wrong* -- not a matter of opinion
-- not a matter of taste -- wrong. It underperforms by a factor of 3
because it defeats the natural pipelining possible in the three-loop case.
There is no getting around it.
- Carl