ietf-openpgp
[Top] [All Lists]

Re: cleartext signatures - trailing white space - proposal

2004-03-12 06:32:14

Werner Koch wrote:
On Thu, 11 Mar 2004 20:40:56 -0500, Ian Grigg said:


    Also, any trailing whitespace (characters <= 0x20) at the


Please don't define whitespace this way.  I know software using
control characters to separate fields (e.g. STX (0x02) or FS (0x1c))
in a line. Ignoring them in a signature (at the end of a line) might
very well change the content of the message (even if those fields are
empty).


OK, that makes for two votes for an explicit short
list of whitespace characters.


SPACE, LF, CR and TAB are the whitespace characters we have always
used in PGP and so should it be - that is also what most
programmers[1] understand under whitespace (cf. K&R).


I think there is a difference between whitespace and
line endings, as far as OpenPGP cleartext signatures
are concerned, at least.

The issue comes when you get files that are garbled
in their line endings:

   line< ><CR><CR><LF>

Or

   line<CR> <LF><CR><LF>

and various other combinations.  In the past, when
coding up that sort of thing, I've adopted the strategy
of saying that any change in the nature of the line
endings is treated immediately as a panic (caller to
fix).  E.g., there is at least one manifest error, and
trying to determine the error as being either a line
ending error or a whitespace error makes for too many
complications in the code.

In sum, I'm not sure that we want to define whitespace
in this immediate context as including the legal line
ending characters...  Comments?


> VT and FF would
also belong to them, but given that we did not used them in PGP, I's
feel better not to add them now.


If we are going for a list of characters, the shorter
the better, in general.  It seems more likely that
these characters VT, FF, have defined meaning within
the text than are likely to be added later by
transmission gremlins.


Note 4.  And, to clarify Unicode, I suggest adding:



    No exception for Unicode whitespace is defined,
    and all Unicode characters SHOULD NOT be ignored.


With a list of white space caracters along with their encoding values,
we won't need that.


I know we don't need it, but without an explicit
mention of Unicode, I suspect there will be a
an endless stream of questions, and also, people
will start including their Unicode whitespace
chars because there is no explicit guidance...


[1] Well, speaking of C programmers; don't know about Java.


Perl uses this definition of whitespace:

  \s      A whitespace character      [ \t\n\r\f]

which includes form feeds as 0x0c (I think).

Java uses the java.lang.Character.isWhitespace()
method, which probably depends on the character
set!

I don't know about Python, or Microsoft languages.

This underscores is that the ID should NOT rely
on any languague's definition of whitespace, and
should seek to define explicitly what is meant.

iang