ietf-822
[Top] [All Lists]

Re: UTF-8 over RFC 2047 (Re: Call for Usefor to recharter)

2003-01-15 10:12:58

In <20030114155828(_dot_)40dd016e(_dot_)moore(_at_)cs(_dot_)utk(_dot_)edu> 
Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

The reason people disobey the
7-bit standard is that the 7-bit standard sucks.

True.  Then again, the obvious alternative also sucks - at least in
the short term.  Just because something sucks doesn't mean it's not
the best way to do something.

Though it is unlikely to be the best way to do something if people
routinely disobey it :-( .

But yes, I find myself in agreement with most of what you are saying here.

I do believe that migrating to utf-8 in message headers is the way
to go in the long term, and that a transition strategy similar to what
Dan is suggesting (support utf-8 on receipt soon, generation of utf-8
later) is ultimately the way to get there from here.  (I suspect we
have very different ideas about what would be a reasonable interval
between those two events)

And probably very little control over that interval, too :-( .

But there are several things I don't believe.

One, that this will return us to a world where messages are ordinary
text and can be treated as such.  For instance, even if we allow utf-8
in message headers, there will still be a need for canonicalization of
certain fields before sending and/or before comparison of values
embedded in those fields,

Yes, individual protocols will impose such constraints as are necessary.
The Usefor draft has a lot to say about newsgroup-names, for example.

Two, that this allows use of utf-8 in addresses or domain names.
Those are separate problems,

Yes. I think they should eventually be allowed (and will happen anyway).
But that is an issue for the email standards. All addresses and domain
names in Usefor currently follow RFC 2822 exactly, and it is not proposed
to change that.

Three, that the existing ability of some user agents to display utf-8
in message headers is sufficient for proper processing of headers 
containing utf-8,

Four, that user agents that support utf-8 receipt in the near term will
receive sufficient testing through ordinary usage to be reliable at
supporting it once generation of utf-8 in headers is endorsed.
(so some other testing will be needed - not the way IETF normally works)

Support for and acceptance of Unicode (whether as UTF-16 or UTF-8) seems
to be rising exponentially. It started from a very low level 10 years ago
(as Dan keeps reminding us), has now reached the point of being very
visible, and will therefore (if the exponential is followed) be almost
universal in no time at all. No doubt early implementations got it wrong in
various ways (including normalizations, no doubt) but we can expect those
things to be put right, since the Unicode standards are pretty clear.

Five, that supporting utf-8 rids user agents of the burden
of supporting 2047, IDNA, and similar encodings, at least for
reading/presentation purposes, probably within our lifetimes.

Probably sooner than you would think (or like) if my exponential
hypothesis is correct.

In other words, utf-8 is the right way to go for the long term - 
but the devil is in the details and it's not nearly as simple
as it looks.  And I do think that a two-step transition is necessary -
we can't just start cramming utf-8 into either email or usenet and
expect things to work reasonably.

I also think that usenet can serve as a good test case for email.
(translation: I'd much rather usenet suffer the initial disruption than
email, so email could perhaps learn from usenet's experience.)

Yes, and Usenet (or rather Usefor) is quite willing to stick its neck
out and be the guinea pig. It is an environment that is much more tolerant
of interoperability failures (or, it you like, it has lived with them for
years) and so is much better placed to cope with that initial disruprion.
And we are quite willing to try to keep that disruprion out of email so
far as we can.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

<Prev in Thread] Current Thread [Next in Thread>