Re: Text canonicalization


At 7:33 PM -0500 12/27/01, David Shaw wrote:

This sounds very good, but what about detached signatures?  A detached
signature doesn't carry the text with it, so wouldn't the the text
(presumably delivered via http or ftp, which can change line endings)
need to be re-canonicalized for signature verification?  To a certain
degree this applies to a clearsigned document as well.


The point of what I'm suggesting is that generating or verifying a
signature is always (or almost always) just hashing the data and going on
with it, no matter what the mode. Text mode comes in when the processing
program can guarantee that the signed data is in canonicalized text. So
most detached sigs must be in binary mode, unless the implementation wants
to transform the text, or otherwise knows that it's in that format. That's
what I mean by it being an assurance. The cryptography is always the same.
Text mode means that if you use a well-defined transform from canonical
form, it will be in the right character set etc.

Email delivery of the text is a whole other can of worms, though it
could be argued that if it's email, it should be PGP/MIME.


I suppose that can be argued, but I argue otherwise. I and many other
people delete PGP/MIME without reading, but that's another discussion.
(I've actually been planning such a rant, and was composing it in my head.
Thanks for the opening.)


With regard to text vs. binary, the rule I'm saying is that when you create
a signature, and when you verify it, you process the actual bits in a data
packet, no matter where that data packet is. The crypto sections of an
OpenPGP system ignore the mode.

The interesting problem comes, as I understand it, comes from this scenario:

Alice puts on an FTP server a text file, foo.txt, and a detached signature,
foo.sig. Bob FTPs both files, foo.txt in text mode, and foo.sig in binary
mode, and the line ends get changed as part of FTPing the file.
Consequently, the signature doesn't verify.

I can think of several answers to this scenario:

(1) Don't do that, you'll hurt yourself. Come on, digital signatures are
fragile and brittle. Text mode operations, be they OpenPGP's, FTP's, or
anyone else's, alter text, and that's going to cause problems.

    (a) Alice puts the file on her system in some well-known format that
the right thing will be done with. Examples of this include ZIP files,
.tar.gz, .gz, .Z, .sit, etc. The signature is over the container. Bob FTPs
them both in binary mode, verifies the signature and unpacks foo.txt.

    (b) Bob FTPs both files in binary mode and verifies the signature. Then
he translates foo.txt, or re-FTPs it in text mode.

    (c) Bob's OpenPGP program could tell him that the file seems not to be
in "canonical text" and that this may be the reason why it didn't verify.
This is arguably something of a copout, as it can be irritating to have a
program tell you something didn't work, and here's why. I know I always
mutter to the system, "You're the computer, if it's that easy, fix it."

(2) Bob's OpenPGP program has a heuristic to try to compensate for a text
translation. Note that I said try. There are many, many ways this can fail.
But there are many ways it can work correctly. If an OpenPGP program (or a
wrapper around one -- this is a perfect place to use a perl script) tries
the straightforward thing of converting line ends to CRLF, it will probably
work just fine. If it tries the next simple attempt -- assume that the text
is in ISO Latin-1, and convert it to UTF-8, it will probably get most of
the rest. Certainly most mail things will work right with these.

(3) Put the text file in an OpenPGP clearsigned message. I suppose this
really ought to be (1)(d), as it's another "don't do that" solution. But it
works. If the problem is that you have a file that you want to be both
directly readable as text and signed, clearsigning is a way to go.

I realize that this brings the implementer into a subset of the text
heuristic because line ends might flipped around. But it *should* already
be put into canonical UTF-8, and thus the problem of figuring out how to
verify it is much easier.

        Jon