[Top] [All Lists]

Transmission issues for transition to UTF-8 Headers

1999-02-12 04:27:48

I've changed the subject field so that matters about UA display, and the like, can be treated separately from the meta-question of interacting email services, namely what is the basic approach that should be used when a UTF-8 host wants to sent to some other host.

This subject line is intended to focus just on the question of coordination between sending and receiving systems, in other words on the exchange protocol, SMTP, and the object being exchanged, in this case 822 or 822bis since headers are the issue.

This particular community has quite a large amount of experience making making changes to a a very installed base. It has upgraded that base more than once. The recent changes involving MIME and ESMTP represent an extraordinarily successful piece of work, and we should not have to re-invent, experiment, or otherwise spend a great deal of time on a topic that is quite clearly a repeat of these earlier times.

The transition to UTF-8 headers has all of the characteristics of the transition to MIME and ESMTP and we should use what we've learned.

What we've learned:

1. If you insist on "just sending" the new stuff, without getting a statement of support by the receiver, then you MUST send it in a fashion which is safe for old recipients. That is the model used for MIME and that is the reason that MIME has such a horrendous appearance, as well as such a safe record of deployment. A computer scientist shudders at MIME's ugliness. An engineer marvels at its successful deployment over an existing, global base of users.

2. If there is any real chance of impact on the recipient, such as breaking their software, the sender MUST get permission before initiating a 'new' behavior. That is what ESMTP options are for. They work dandy. Since 8-bit encoding is well-known to be a source of problems for recipients not ready for it, it is clear that sending 8-bit headers needs either to have 7-bit encoding or explicit recipient go-ahead. We fought the "just send 8 bits" wars years ago. We should not have to fight them again.

3. Unlabeled defaults tend to be problematic, since there is no detecting when they are changed. While it offends one's sense of "efficiency" it's worth the extra bits to label a long-lived object explicitly.

At 01:50 AM 2/5/99 +0000, D. J. Bernstein wrote:
The short-term goal is to allow messages with unencoded UTF-8 in the

4. When making a transition for a global installed base (and actually the rule applies equally for a much smaller base) there is no semantic content to the term "near-term".

Even the smallest change is measured in years. The longest in decades. Hence, serious efforts at transition must design a "transition" mode which is capable of being used permanently.

MIME's 7-bit encoding is an example. While one might claim that 8bitmime is a disaster, that claim misses the larger fact that we exchange binary data with each other today quite comfortably but could not do so before the MIME effort was started. The fact that we send it around "inefficiently" is irritating to some, but is clearly secondary to the fact that we can and do use the exchange mechanism successfully.

The long-term goal is to eliminate the implementation burden of multiple
character sets and =??=. This takes one extra step:

  (4) Mail writers have to convert all outgoing messages from the local
      character set to unencoded UTF-8.

Eventually all character-set markers can be removed.

5. Eventually, COBOL will disappear. Eventually, no one will ever program in assembly language. Eventually, we will have world peace...

So how do we factor in "eventually" to practical planning efforts?

Chris Newman writes:
* Create UTF8HEADER SMTP extension.  Provides RFC 2047 downgrading for
  both top level headers and nested message/rfc822 headers.

That doesn't survive a cost-benefit analysis.

Right.  Instead it just survives the test of global, practical demonstration.

In other words, the alternative doesn't survive a credibility analysis, unless you want to declare the world's movement to X.400-based UA's a success.

Your conversion is safe in a fantasy world where all message readers



why would anyone find it distasteful to interact with such a participant? I just can't imagine.


For moving to UTF-8 headers, we need an esmtp option and, I suppose, an 822(bis) header.



Dave Crocker                                       Tel: +60 (19) 3299 445
<mailto:dcrocker(_at_)brandenburg(_dot_)com>             Post Office Box 296, 
                                        Serdang, Selangor 43400  MALAYSIA
Brandenburg Consulting
<>                       Tel: +1 (408) 246 8253
Fax: +1(408)273 6464             675 Spruce Dr., Sunnyvale, CA 94086  USA