ietf-822
[Top] [All Lists]

Re: UTF-7 vs. UTF-8 for fallback charset?

2001-12-10 21:54:05

--On Wednesday, December 5, 2001 16:08 +0100 Marc Mutz 
<mutz(_at_)kde(_dot_)org> wrote:
KMail now has UTF-7 support (ie. it does understand that charset, but
doesn't use it actively).
Are there any interoperability concerns with UTF-7?

Yes, there are.

First, many people don't realize that UTF-7 is actually a double-encoding. It's a second layer of encoding on top of UTF-16 which is an encoding of UCS-4. As of the most recent Unicode spec, this is a problem for non-English languages. UTF-8 is a single encoding (yes, quoted-printable UTF-8 is double encoded, but it's handled cleanly by separate layers and isn't needed often).

Also suppose I don't have a Unicode-aware email client, but I do have a Unicode aware editor. Most Unicode aware editors support UTF-16 and UTF-8, but don't support UTF-7. Thus UTF-8 is more likely to be readable by a recipient than UTF-7. And in the rare case 7-bit encoding is needed, virtually every client can remove quoted-printable, while only UTF-7 aware clients can do anything useful.

Also note that US-ASCII is _not_ a subset of UTF-7, since UTF-7 steals a character as an escape. This can cause all sorts of interoperability problems.

I could go on with more interop problems.

UTF-7 is a really bad idea in email.  _Please_ don't generate it.

                - Chris