Dan Kohn wrote :
I wrote:
I have a simple question. What can a UTF-8 subject header
communicate that an RFC 2047 one can't? Other than inelegance,
what's the downside of 2047, when the upside is a huge increase in
backward compatibility?
I do not know where this discussion took place, but I have an answer to it.
It's a simple fact.
In every single thread with non US-ASCII data in subject encoded by
RFC2047 (sorry I wrote 2049 by error in my last mail) I've seen, the
subject turned to garbage after 5 or 6 messages.
The reason for that is that all implementations of RFC2047 around are
full of implementation errors.
The reason for that is that the RFC2047 encoding is full of specific
cases, hard to understand rules, and enables an amazing number of
different possibilities for the encoding of the same string.
The analyses of it that was done during recent discussion in the usefor
mailing list led to the discovery of incredibly obscure border cases,
that can only result in an implementor getting it wrong, or having to
choose between respecting the standard, or refusing that other will
produce, which will make it look like it is the one that gets it wrong
given the number of software that will produce the incorrect encodings.
And here raw UTF-8 is a clear winner. No complex implementation rules,
no border cases, one string will always have one and only one
representation.
Another choice would be throwing away RFC2047 and devising a new 7 bits
encoding that does not have all the inconveniences.
This has been debated, but not choosen.
In my opinion, here are the main reasons that justify that :
- reserves against producing yet another encoding (that might itself
have defaults that are immediatly apparent)
- the fact that with a very wide majority the non US-ASCII world has
*already* choosen raw 8 bit against 7 bit (RFC2047) encoded data.[1]
- such an encoding would have no support at first in the installed base
of softwares, whereas both RFC2047 and raw utf-8 at least are already
supported by a part of them.
[1] This excludes the parts of the world where the standard encoding for
email is seven bit.