[Top] [All Lists]

UTF-8 and literal packets

2004-03-23 08:09:25

[Editor hat off, representative of implementer hat on]

We at PGP have been talking with David Shaw about an issue we're having with UTF-8. The problem is that there are a number of times where someone takes text that is not UTF-8, but something like 8859-n, passes it into either GnuPG or PGP, sends it to the other, and then we end up displaying it wrong. Abstractly, this is not a problem that can be solved entirely. Heck, both my mailer and my web browser have menus where I can select what character set/encoding to assume something is in.

We would like to have a modification to the literal packet, where there is a type 'u' packet which is identical to the type 't' packet except that in this packet the implementation is saying, "By gum, I *know* this really, really contains UTF-8 in it. Trust me. Really."

We've already tested this and both GnuPG and PGP handle a literal 'u' packet like 'b', which only has the potential drawback of artifically CRLF endings. This gives us a way, however, to get proper layering in the sort of systems that we interact with.

Any objections?


<Prev in Thread] Current Thread [Next in Thread>