ietf-822
[Top] [All Lists]

10646, UTF-2, etc.

1993-02-06 18:36:18
In his presentation of the Plan Nine UTF-2 work at the recent Usenix
conference, Rob Pike made an interesting point that is quite relevant
to the assorted discussions about >8-bit character sets.  He said
(roughly) "the hard part is making the code understand that octets
and characters are not synonymous".  Once that is done, the details --
how the two are related, whether a character is 16 or 32 bits, etc. --
are very much secondary, particularly if libraries etc. are designed
to hide the implementation details properly.  They changed from the old
10646-appendix UTF to UTF-2 in an afternoon:  header files and
libraries were replaced, a big recursive "make" was done to rebuild
the software, and a little program ran around finding UTF disk files
and converting them in place.

He also noted that there was one visible benefit from switching to
UTF-2:  a lot of bugs disappeared.

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry

<Prev in Thread] Current Thread [Next in Thread>