ietf-openpgp
[Top] [All Lists]

Re: Let's resolve the end-of-line and whitespace question

2004-02-20 08:20:06

Jon Callas wrote:
a) it would be useful if the whitespace were
clarified to eliminate potential Unicode white
space.

<http://www.imc.org/ietf-openpgp/mail-archive/msg07714.html>


If someone will let me know what they are, I'll put them in.


The above thread seems to have been renumbered to be
this one:

http://www.imc.org/ietf-openpgp/mail-archive/msg04791.html
http://www.imc.org/ietf-openpgp/mail-archive/msg04793.html


Here's a summary:

UTF-8 character sets can include whitespace in
seemingly arbitrary fashions.  As the UTF-8
space is very large, it seems unreasonable for
the OpenPGP WG to attempt to track what is
whitespace from a complete, UTF-8, point of
view.

However, whitespace is declared as being stripped,
and the implication in the rfc2440 document is
that this is declared to be common us-ascii
formats, explicitly.

So a way through this may be to say, in the
context of cleartext signatures:

    a) us-ascii whitespace is always stripped from
       the end of lines in signature calculation,
       both in signing and verification.

    b) UTF-8 whitespace may be stripped from the
       end of lines in signing, but in this case, the
       document should be transmitted in its clean
       interchange form, with these characters
       cleaned from the end of lines, so that the
       recipient can calculate (verify) correctly.

       Ref: Jon Callas,
       http://www.imc.org/ietf-openpgp/mail-archive/msg03753.html
       which would merit being included as a policy
       within the rfc2440 (that is, cleartext sig
       preparation should change documents into the
       interchange format, and thus strip whitespace
       proactively).

    c) us-ascii whitespace is defined to be, for
       the purposes of RFC2440 cleartext signature
       calculations, a) above, to be:

         1.  space (0x20), tab (0x09), nl (0x0a),
             cr (0x0d) ...
         2.  everything <= 0x20, or
         3.  or...

       (pick one) - the essence being here that the
       document does not define which it is.

Assuming this is acceptable to the WG, the above
messages include some example suggested text
(now 04791, 04793;  which google previously recorded
as 07728, 07718).

Alternatively, if the WG has decided to define
UTF-8 whitespace and strip it from cleartext, is
that something that happened in Seoul?  Are there
notes on this?

iang