[Editor hat off, representative of implementer hat on]
We at PGP have been talking with David Shaw about an issue we're having
with UTF-8. The problem is that there are a number of times where
someone takes text that is not UTF-8, but something like 8859-n, passes
it into either GnuPG or PGP, sends it to the other, and then we end up
displaying it wrong. Abstractly, this is not a problem that can be
solved entirely. Heck, both my mailer and my web browser have menus
where I can select what character set/encoding to assume something is
in.
We would like to have a modification to the literal packet, where there
is a type 'u' packet which is identical to the type 't' packet except
that in this packet the implementation is saying, "By gum, I *know*
this really, really contains UTF-8 in it. Trust me. Really."
We've already tested this and both GnuPG and PGP handle a literal 'u'
packet like 'b', which only has the potential drawback of artifically
CRLF endings. This gives us a way, however, to get proper layering in
the sort of systems that we interact with.
Any objections?
Jon