[Top] [All Lists]

Re: cleartext signatures - trailing white space - comments

2004-03-11 18:34:55

Comments on some areas, below, assuming continued
debate.  Proposals in next email.

===================== Hal Finney, 2004.02.20
{comprehensive list of unicode spaces, elided}

Therefore I think all of these should be hashed
even if they do occur at the end of a line.


The only one left is IDEOGRAPHIC SPACE, which I suspect is the default
space character in ideographic languages (although it's possible they use
ordinary SPACE).  I could imagine it being put at the end of a line by
accident, by a Chinese typist or poorly designed word processing program,
so I'd suggest that it should be stripped before hashing.

This is the only one I would suggest adding, along with SPACE.

My view on this is that a) Hal's summary is
very good, but not necessarily complete, and
b) I'm not sure we have the wherewithall to
be able to predict even the characters that
are there.

So, I would say that any Unicode whitespace
that are encounted SHOULD NOT be treated as
whitespace, in this context.

Later implementations may divine more
clearly what to do here, in which case
they might be encouraged to create an Armor
Header that states how to treat.  Otherwise,
the default is that Unicode characters have
no special treatment, for simplicity, IMHO.

===================== Derik Atkins, 2004.03.08
* Trailing White Space: The issue is that some e-mail gateways strip
trailing white space on lines when processing mail messages.  This
cause signature validation failure at later date.  The question is
whether this is an issue that needs to be addressed.

One proposal is to strip EOL characters where the character <= 0x20.  From
the floor it was pointed out that this could cause problems from two things.
1) there are some control characters that may be part of the text stream
(such as page feeds) that should not be stripped and

I'm unsure what to make of this - any comments?

If page feeds shouldn't be stripped, then maybe
backspaces shouldn't be stripped, and we are
back to space/tabs being stripped?

> 2) for some languages
escape characters for local language processing might produce characters
that are in this character range and thus produce corruption of the text.

The way I see this is:

If a character set outside Unicode is being used,
then that should be indicated in the Armor Headers,
and then interpreted properly such that corruption
is not present.  If not, it will also muck up on
line endings CR/NL.

Elsewise it is in Unicode, and the rules apply.

One suggestion was to do the standard MIME time canonicalization and ignore
the rest of the issues.  If the message is changed by stripping spaces in a
gateway, then the message correctly fails validation.

I'm not quite sure how MIME canonicalization works,
but the issue is wider than mail, things like
cut&paste are widely used for cleartext signed
documents, and these tools tend to add spaces
on the end.

iang (proposal to follow)