[Top] [All Lists]

Re: hand huffman encoding at PGP world HQ

1997-11-24 06:31:57

Hal Finney <hal(_at_)rain(_dot_)org> writes:
It is true that Phil Zimmermann has been the chief advocate of making
PGP data structures as concise as possible. [...]

First, the difficulty is really not as large as people are making out.
The code to read the packet lengths in the various cases is less than a
page, and it's very simple.  Read a byte to see if it is the old or new
packet formats.  In the old case, read 1, 2, or 4 bytes to get the length;
in the new case, read 1 byte and if it's 0xc0 or over, read a second.

The issue is the complexity and the code bloat.

So with universal 32 bit length values, the code looks like:

word32 len;
char ctb = getc(fp);
len = ntohl(len);

With bit twiddling the code looks more like:

word32 len = 0;
char lchar[4]=(char*)&len;
char ctb = getc(fb);
int lol = ctb & 3;
ctb &= ~3;
if (lol == 0) { read(fp,lchar+3,1); }
else if (lol == 1) { read(fp,lchar+2,2); }
else if (lol == 2) { read(fp,lchar,4); }
len = ntohl(len);

Seems to me there is more likely to be an error in the second set of
code (probably is -- I haven't compiled it), also that the second
example will be larger, and the second example took about 4 times
longer to code.  Multiply by the other bit twiddling operations (new
192 bit twiddling, proprietry floating point stuff, and the same kind
of philosophy applied throughout etc) stir in some programmer error,
and there is significant extra complexity and code bloat, adding to
development time, creating errors, and dissuading people from coding
to OpenPGP.  The code bloat will affect smart card implementations.

The payoff is that you have slightly reduced the size of all the PGP
exchanged data.  By itself it is not so much, but as a philosophy applied
throughout the design, these savings add up.  The alternative philosophy
of making everything 32 or 64 bits can lead to bloated data structures.
It's easier on the coder, but every user pays the cost in terms of
larger data.

I don't think it's that big a deal.  The only place where size at the
bit scrimping level matters much is signatures.  I reckon use of 32
bit ints will add at max 3 bytes to the length of the signature.  That
won't even cause an additional line wrap on your DSS sig example.  The
longer keyID will be much more significant.

Version: PGP for Business Security 5.5.2


(He'd like to see the -----END line go away, too, since it is redundant.)
You can hardly imagine a more concise signature block than this.

If you really want some nasty hacks to save space, I'm sure I could
construct some -- I am something of a past master at hand huffman
encoding cf the RSA in perl and dc signature -- an implementation of
RSA which is smaller than most PGP signatures :-)

Compare that with SMIME.  If you've seen an SMIME signed message
in any public forum, you've probably also seen complaints about it.
The signature block is about 50+ lines long.  

Yes David Sternlight just posted one to this forum.  I had just
previously picked apart a x509 signature made by a colleague -- 4350
bytes, and most of it turned out to be text disclaimers and legalese.

ISO 7816-9 is being re-written to come up with a smaller x509 style
certificate because the project involved couldn't fit three of the
x509 bloaters on their cards!  So that is an example of at least one
standard which might find OpenPGP interesting, and proof that this
issue is a problem.

Now, this is largely a design error in the client software, in that
it is including the whole certificate chain in the signature.  But
it is probably going to be a factor in slowing acceptance of SMIME.
Nobody likes a technology which brings them tons of flames every
time they send a clearsigned message.


Another neat trick to remove this problem is what mailcrypt used to
offer: to puts the PGP signature in an X-signature header (can't find
it in the docs for v3.4 he may have removed it).  (Course you wouldn't
want an X509 sig in your headers -- the huge header might crash some

PGP's philosophy has been one of being willing to pay a cost in
implementation complexity in order to reduce objectionable aspects
of the software to end users.  Part of that has been making the data
structures as compact as possible.

I am confident I can shave a few more percent or so off your current
data structures... you really want to see the hacks?

Let me make one technical point, in case all this philosophizing is
unpersuasive.  Part of the purpose of the new packet length headers is
to allow "one pass" processing.  You can put out a packet length header
which says "will be followed by 2K of data, then another packet length
header".  This allows you to start emitting data even when you don't
know the total length of the data in advance, and also allows lengths
greater than 2^32 bits.

One pass processing is a useful feature -- perhaps we could do
something simple like SSLv3 does -- max of 16k -- a simple 16 bit
integer.  No need for this > 192 means something odd hacking -- the 1
byte of bloat I don't think will be a problem.

This is one of several technical changes which are designed to allow
one-pass processing.  Yesterday I explained how the new ascii armor header
line and the one-pass signature packets also supported this capability.
It's an important feature which allows software to get by without the use
of temporary files.

Any changes to another packet length header format should provide some
way of allowing one-pass processing.  A simple 32 bit length will not
be adequate.

You could either have a 16 bit length field, or a 32 bit length field.
You can still have the continuation feature, this is useful and
largely separable feature.

Now officially an EAR violation...
Have *you* exported RSA today? -->

print pack"C*",split/\D+/,`echo "16iII*o\U(_at_){$/=$z;[(pop,pop,unpack"H*",<>