ietf-822
[Top] [All Lists]

Re: 10646, UTF-2, etc.

1993-02-07 04:01:13
In his presentation of the Plan Nine UTF-2 work at the recent Usenix
conference, Rob Pike made an interesting point that is quite relevant
to the assorted discussions about >8-bit character sets.  He said
(roughly) "the hard part is making the code understand that octets
and characters are not synonymous".  Once that is done, the details --
how the two are related, whether a character is 16 or 32 bits, etc. --
are very much secondary, particularly if libraries etc. are designed
to hide the implementation details properly.

Untrue. 16 bit or 32 bit are not implementation details.

With 16 bit wchar_t, you can write

        array[(unsigned)char_code]

and your program should work on most modern machines.

With 32 bit wchar_t, it is often impossible to write:

        array[(unsigned)char_code]

because hardly no machine have >4GB virtual memory.

They changed from the old
10646-appendix UTF to UTF-2 in an afternoon:

They both use 16 bits.

                                                Masataka Ohta

<Prev in Thread] Current Thread [Next in Thread>