ietf-openpgp
[Top] [All Lists]

Re: Clearsigning, MIME, etc.

2002-04-18 09:31:15


From: John Dlugosz

Re:
 - ASCII armor proper can be fixed by giving a clear specification of
   the character set issues involved: Either mandate UTF-8, or
   mandate tagging and use UTF-8 as the default.  The current
   language is considerably too fuzzy, and - I believe - mostly
   ignored.


Excellent report and summary--thanks for clearing everything up.

I'd like to make one point.  If ASCII-armor and PGP "stuff" inside the
readable text in general is for dealing with systems that don't have
meta-data for this, we can't mandate the use of a mail header.  If it's
stored in a txt file, it doesn't =have= a mail header!

Listing the charset info as clear-text would be helpful in general to the
human reader, too.  So you could put it in the PGP headers after the
-----BEGIN and before the blank line separator.

Or, it can be a flag in the "encoding", but the only encoded stuff in this
case is the signature block itself.  Why not both?  If you change the text
file, you can change the human-readable charset header to match, and when
you verify, the tool will see the mismatch and convert back to the format
that the sig was used on, or give a suitable error.

Or, the signature can "normalize" text by converting it to UTF-8
internally.  The verify would know to do the same, from whatever the file's
charset it.

Putting that together, I'd propose something like this:

Use a header inside the PGP envelope to note the message's character set.
     -----BEGIN PGP SIGNED MESSAGE-----
     Hash: SHA2
     Charset: ISO-8859-1-Windows-3.1-Latin-1

     Message starts here...
Now how many people are going to use the correct official name, when it's
such a jawbreaker?  Look at the charset declaration in web pages, and very
few get it right.  So, better make that clear.

Meanwhile, the signature itself (the base64-encoded packets) would contain
a flag that states that the normalization of the text included converting
to UTF-8.  Without that flag, it does what it does now, and takes whatever
bytes you give it unchanged except for rules about trailing whitespace and
linebreaks.

In summary, I think the idea of recording this information in two places
(the clear text and the sig) is valuable, and allows the text block to be
re-coded for display and still check the signature.  An existing MUA won't
know to change the Charset line, but between that value and the mail header
and your machine's configuration, you have enough information to figure out
just what it did!




<Prev in Thread] Current Thread [Next in Thread>