UTF-8 and literal packets


[Editor hat off, representative of implementer hat on]

We at PGP have been talking with David Shaw about an issue we're havingwith UTF-8. The problem is that there are a number of times wheresomeone takes text that is not UTF-8, but something like 8859-n, passesit into either GnuPG or PGP, sends it to the other, and then we end updisplaying it wrong. Abstractly, this is not a problem that can besolved entirely. Heck, both my mailer and my web browser have menuswhere I can select what character set/encoding to assume something isin.

We would like to have a modification to the literal packet, where thereis a type 'u' packet which is identical to the type 't' packet exceptthat in this packet the implementation is saying, "By gum, I *know*this really, really contains UTF-8 in it. Trust me. Really."

We've already tested this and both GnuPG and PGP handle a literal 'u'packet like 'b', which only has the potential drawback of artificallyCRLF endings. This gives us a way, however, to get proper layering inthe sort of systems that we interact with.


Any objections?

        Jon

<Prev in Thread]	Current Thread	[Next in Thread>
UTF-8 and literal packets, Jon Callas <= Re: UTF-8 and literal packets, Hal Finney Re: UTF-8 and literal packets, Werner Koch Re: UTF-8 and literal packets, David Shaw Re: UTF-8 and literal packets, Jon Callas Re: UTF-8 and literal packets, Hal Finney