ietf-822
[Top] [All Lists]

Re: UTF-8 over RFC 2047 (Re: Call for Usefor to recharter)

2003-01-14 16:48:03

Russ Allbery writes:
My point is just that I'd like to see some reason to believe that all
the people sending unlabelled 8-bit content are interested in moving
to UTF-8.

It's a question of costs and benefits. How do we minimize the damage
when the user wants to send something more than ASCII?

The simplest thing to do is take 8-bit data from the user---let's say
8859-1---and transmit it as is, hoping that the recipient is using
8859-1 too. Of course, 8859-1 (like ASCII) is a deficient character set,
so it isn't used everywhere, so this strategy has a noticeable failure
rate.

Encoding in RFC 2047 can make the display work for recipients with other
local character sets. However, it screws up the display for recipients
whose software doesn't go to the effort of decoding RFC 2047. For many
users, this is a loss, so many message writers don't do it. (More for
USENET than for email.)

What's happening now is that more and more software is learning to
handle UTF-8. (I've heard a rumor that even sendmail, which has screwed
up bytes 128-159 for years, is finally going to start handling UTF-8.)
At some point, translating to UTF-8 will have a clearly higher success
rate than either 8859-1 or RFC 2047, so message writers will start doing
that. (Sooner for USENET than for email.)

In the long run, UTF-8 will work everywhere. Of course, for a safe
transition, users should stick to ASCII until the UTF-8 support is in
place.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago