ietf-822
[Top] [All Lists]

Review of the UTF-8 transition

2003-02-14 12:43:41

Bruce Lilly writes:
if there's a desire to move to utf-8, use of untagged 8-bit content
will have to cease first so that when generation of untagged utf-8 is
eventually permitted, one can be absolutely assured that such untagged
8-bit content *is* utf-8

No. Here, once again, is the (blazingly obvious) transition plan.

The first step in the transition is to change message handlers to allow
UTF-8 messages, if they don't already. In particular, message readers
have to display messages as UTF-8 if they look like UTF-8. Impact:

   * In the IETF universe, this is a perfectly safe step, since 8-bit
     messages have no previous use.

   * In the real world, there are many messages in (e.g.) 8859-1.
     However, those messages generally don't follow the UTF-8
     byte-sequence restrictions, so they aren't affected by the change.
     Practically all messages following the UTF-8 restrictions are, in
     fact, UTF-8; the change means that they are displayed correctly.

The second step is to allow message generators to use UTF-8. Impact:

   * In the IETF universe, this can wait until after the first step is
     done, because users are all perfectly happy with ASCII right now
     and would never consider blatantly violating the standards. Using
     UTF-8 is perfectly safe once the first step is done.

   * In the real world, message generators will naturally switch to
     UTF-8 from (e.g.) 8859-1, once that's beneficial for the users.
     This will happen once the (shrinking) UTF-8 failure rate drops
     below the (fairly small but never shrinking) 8859-1 failure rate.

In the end, the IETF universe and the real world will end up at the same
place. The current mess of incompatible character encodings, and its
associated failure rate, will disappear. UTF-8 will be supported and
used everywhere.

Your silly notion of being ``absolutely assured that ... untagged 8-bit
content is utf-8'' has no relevance to the transition. It won't be true
in the real world until long after the transition is complete.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

<Prev in Thread] Current Thread [Next in Thread>