ietf-openpgp
[Top] [All Lists]

Re: cleartext signed messages - UTF-8 - stripping the whitespace

2004-01-06 10:32:02

Adrian 'Dagurashibanipal' von Bidder wrote:

On Monday 05 January 2004 16:24, Ian Grigg wrote:

...
3.  What was the original deep dark motivation
    for stripping whitespace from the end of lines
    anyway?

4.  Do we care if UTF-8 has some weird whitespace/
    line endings?

IIRC from previous discussions (I wasn't around for years when PGP was
introduced to the world...): some mailers (MUAs and MTAs) used to strip
whitespace occasionally or do other weird things.


Thinking about it, I've come across editors and
desktops that do something similar, they add spaces
to the end of lines in arbitrary fashions, and
sometimes modify newlines (take away, add) at the
end of files (but newlines are adequately protected
already in the draft).


Those old mailers would probably either treat all non-ASCII whitespace and
line-endings as normal characters, or not be 8-bit clean anyway and so cause
problems in any case. So the answer to (4) is probably a clear no.


So, the upshot is that only the defined US-ASCII
whitespace chars should be included in the
canonicalisation:

    ....
    Also, any trailing whitespace (spaces, and tabs, 0x09) at the end of
    any line is ignored when the cleartext signature is calculated.

In which case it might be worth adding a comment
to that effect.  Because of the difficulties of
predicting the future here, I'd suggest the following:


    Also, any trailing whitespace (0x20, 0x09) at the end of
    any line is ignored when the cleartext signature is calculated.
    Implementations MAY elect to clean line endings of whitespace
    in the final signed form of the document, including UTF-8 forms.


Thus, when we hit upon some troublesome mailer that
mangles non-english language Word-prepared UTF-8 documents,
the OpenPGP plugin can pre-emptively clean it up and
still be in accord with the standard.

If applications start adding UTF-8 whitespace afterwards,
then we have more of a problem.  I guess at that point,
implementations can agree to add additional characters
to the "ignore" list.

iang

<Prev in Thread] Current Thread [Next in Thread>