[Top] [All Lists]

Re: [openpgp] Character encodings

2015-03-17 14:27:09
In my experience it's not getting any better for PGP messages that are not
composed in a basic text editor.  Users composing messages on a mobile
devices, for example, do not always default to UTF8, they use the
system-wide character encoding setting (or the charset encoding specified
by the composing app itself).

For example, iOS Apple basically says if you don't know the original
encoding, you have to basically "guess" by trying various encodings until
you find one that works.
Fortunately, it usually only takes a few tries to get it right if its not

I agree that UTF-8 should be preferred and enforced wherever possible. But
in cases where it is not, it would help if the sender was able to provide a
hint as to what the encoding actually is, and do so in a standardized
manner that can be easily implemented.

On Tue, Mar 17, 2015 at 3:00 PM Tim Bray <tbray(_at_)textuality(_dot_)com> 

This would be a huge step backward. The proportion of text on the internet
that is UTF-8 is monotonically increasing toward 100%. Thank goodness.
On Mar 18, 2015 4:38 AM, "Wyllys Ingersoll" <wyllys(_at_)gmail(_dot_)com> 

One area that I think needs some attention is the character encoding and
charsets for encrypted text messages.

4880 says that everything should be UTF-8.  However, the reality is that
UTF8 is not used everywhere and there are lots of clients that compose
messages in their native preferred character set (Latin5, Greek, Kanji,
etc) and its very difficult as an implementor to figure it out after the
fact without some indication from the sender.

The literal packet format only specifies 3 possible values - binary,
UTF8, or plain.  The ASCII Armor header may specify a different charset
(though unfortunately very few agents add the "Charset" PGP header).
Additionally, if the message had MIME headers, there may be yet another
charset indicated in MIME that differs from the ASCII Armor charset and the
literal packet data format byte.

If the encrypting PGP software knows what character encoding was used to
compose the original message, there should be some way to communicate this
in the message that would be definitive so that the decrypting software can
present it the way it was originally intended.  As an implementor, this is
one of the trickiest areas to get right so that the end user sees the
messages as it was originally intended.

openpgp mailing list

openpgp mailing list
<Prev in Thread] Current Thread [Next in Thread>