Quoteing blilly(_at_)erols(_dot_)com, on Mon, Feb 10, 2003 at 02:57:26PM -0500:
Because there are old messages, all of the existing methods need to
continue to be supported indefinitely so that those old messages can
still be read. A transition requires backward compatibility so that
the infrastructure doesn't suddenly break (as that would not constitute
a transition), and it requires a feasible plan. The Usefor draft
breaks backward compatibility and provides no feasible transition plan.
That's not "working on solutions", that's *compounding* the real problem.
All currently valid email is 7bit/ASCII. Its meaning will not change if
future email defines a meaning to 8bit message headers, and assigns that
meaning to be some character set, such as utf-8.
So, it is backwards compatible in this sense, is it not?
In theory, it is not backwards compatible with the SMTP transport, since
that expects messages to be 7bit ASCII.
In practice, I get 8bit messages (mostly spam, but some from native
French speakers) very frequently.
So, a sender of a message with utf-8 in the headers may find it not
delivered. This doesn't sound like a catastrophic break in the current
messaging system. It actually sounds like the only people who will
notice are the senders and receivers, and nobody else.
This disturbs me.
If you genuinely believe that the Usefor draft solves problems rather
than compounding them, you are indeed seriously disturbed.
The lack of technical content and incredibly personal comments in this
debate disturbs me! I don't understand why its like that.
The main objections to utf-8 becoming the "native" charset of internet
messages seems to be:
- objections to utf-8 as a standard/"priviledged" character set
Fair enough. It has some problems, like any possible charset probably
might. The lack of a standard charset has its own set of pretty
serious drawbacks, no distinguished encoding possible for X.509
certificates, for example. However, a number of IETF protocol
families, like PKIX, are going utf-8, rather than have to deal with
multiple national language encodings.
- It is incompatible with RFC[2]822
In my mind, it is a "compatible" extension of RFC2822. It does not
change the meaning of any currently valid messages.
A utf-8 message, of course, does NOT have a defined meaning to an
RFC822 UA. One could argue that neither do RFC2047 encoded messages.
Seeing =?iso-8859-1?b?45;lakdfj322lkdkd?= as a subject isn't much
better than how my UA displays Korean.
Anyhow, there are different shades of backwards compatible. S/MIMEv3
messages, for example, can fail to be handled by S/MIME agents that
used to be valid S/MIME implementations. Of course, they should have
made it through the transport, at least, leading to...
- It is incompatible with SMTP.
This is true. A valid SMTP implementation does not have to transfer
messages that aren't pure ASCII. However, they seem to do so fairly
frequently!
The interesting questions seem to be:
1 - does this mean that it can't be standardized?
It WILL be transported by some SMTP implementations, and by all NNTP
ones. But, I can see a strong objection to allowing a message format
that "may or may not be" transportable.
2 - can a utf-8 encoded message be down-coded during transport?
This is the real problem, it seems, and it seems to be a fundamental
property of the RFC822 format: the header field formats aren't
self-describing. Its not possible to know whether a header field is
unstructured, structured, and if structured, whether words are allowed
to be encoded. Because of that, its not possible to encode/decode
without knowing the field definition, and an automated grep of
all RFCs to determine it would be a little much to ask.
Much as I dislike writing BER codecs, I have to admit that ASN.1 and
XML are better this way.
So, you can only transform some fields. Like the ones that you know
are allowed to containt utf-8, because they are in the USEFOR draft.
What about the others? What about throwing experimental headers that
have binary in them away? Or leaving them, at the gateway admins
option, raw.
What are the problems with this approach, operationally?
This seems to be a really important issue, and speaking as an
implementor, if the mail standards HAVE to be as baroque and difficult
as they are, fine, I can deal keep dealing with them, but, I would
really, really, like to know the design rationale, because utf-8 sure
does seem like it would solve a whole lot of problems.
The RFCs are fairly lacking in any "design and architecture of the IETF
text messaging system" section, "read the mailing list archives" seems
to be the standard cop-out, but the flame to signal ratio is starting to
make my cheeks burn!
Cheers,
Sam