I think the *key* email extracts are below on this issue,
but this is my choice, and I've been harsh in stripping
things out, to concentrate. I also may have missed some
mails, and left out a couple that seemed to have no traction.
(Apologies to those! Please repost if necessary.)
David Shaw introduced the issue, as well as the issue of
space-stripping in text-mode, which has received relatively
little attention. It may be that if we can crack the nut
of cleartext signature canonicalization, then the text-mode
sig falls out easily.
iang
===================== David Shaw, 2004.02.11
2) 2440 says of the cleartext signature:
Also, any trailing whitespace (spaces, and tabs, 0x09) at the
end of any line is ignored when the cleartext signature is
calculated.
Again, PGP through 8 implements this differently than 2440 says,
where trailing spaces are removed, but trailing tabs are not
(again, PGP 2.x behavior).
===================== Jon Callas, 2004.02.20
How about if we remove any whitespace things, and just
canonicalize line ends? It sounds like Unicode
whitespace may be a huge can of worms. Alternatively,
we could just say trim anything that's <= 0x20, which
is a simple enough thing that solves some obvious
attacks with backspacing and bare CRs to overstrike.
===================== Ian Grigg 2004.02.20
My vote would be to trim whitespace and normalise
line endines to CR/NL, where whitespace is <=0x20:
Also, any trailing whitespace (characters <= 0x20) at the
end of any line is ignored when the cleartext signature is
calculated.
I think there should be a comment in there that
indicates what to do with Unicode, just to show
we thought about it, and not waste people's time
asking the question when they are implementing.
Something like:
Unicode whitespace, where defined, SHOULD NOT be ignored.
Or,
No Unicode whitespace characters are defined.
Leaving open the possibility of defining them in
an update?
===================== Hal Finney, 2004.02.20
{comprehensive list of unicode spaces, elided}
Therefore I think all of these should be hashed
even if they do occur at the end of a line.
...
The only one left is IDEOGRAPHIC SPACE, which I suspect is the default
space character in ideographic languages (although it's possible they use
ordinary SPACE). I could imagine it being put at the end of a line by
accident, by a Chinese typist or poorly designed word processing program,
so I'd suggest that it should be stripped before hashing.
This is the only one I would suggest adding, along with SPACE.
===================== Derik Atkins, 2004.03.08
* Trailing White Space: The issue is that some e-mail gateways strip
trailing white space on lines when processing mail messages. This
cause signature validation failure at later date. The question is
whether this is an issue that needs to be addressed.
One proposal is to strip EOL characters where the character <= 0x20. From
the floor it was pointed out that this could cause problems from two things.
1) there are some control characters that may be part of the text stream
(such as page feeds) that should not be stripped and 2) for some languages
escape characters for local language processing might produce characters
that are in this character range and thus produce corruption of the text.
One suggestion was to do the standard MIME time canonicalization and ignore
the rest of the issues. If the message is changed by stripping spaces in a
gateway, then the message correctly fails validation.
As no text has been proposed or was proposed from the floor the issue was
punted back to the authors to propose some text.
=====================