ietf-822
[Top] [All Lists]

Re: UTF-8 over RFC 2047 (Re: Call for Usefor to recharter)

2003-01-09 05:12:22

In <101176831189(_dot_)20030107120152(_at_)brandenburg(_dot_)com> Dave Crocker 
<dcrocker(_at_)brandenburg(_dot_)com> writes:

Correct me if I am wrong, but I believe the "native" representation for
Unicode is something like 24 bits.  (No, folks, please don't correct me it
is any other number over 16.)

Hence, UTF-16 and UTF-8 are methods of encoding a larger bit space into a
smaller representation space, producing variable-length strings.  One crams
the larger space into a 16-bit world.  The other crams it into an 8-bit
world.

There depending on the situation, there can be processing or space
efficiencies gained by one encoding over another.

UTF-8 is space efficient if you still expect to have a large proportion of
ASCII mixed in with it. It compares quite well with UTF-16 for European
languages, but I can imagine it would be less efficient in Chinese.

But there is no theoretical or aesthetic superiority that can be claimed by
one over the other.

There are some practical superiorities. It transports well over many
Internet protocols, because it does not do nasty thungs with CR, LF and
NUL. It has the _whole_ of ASCII as a strict subset (which is not true of
UTF-7).

But UTF-8 would be a pain for internal use in Operating systems, and maybe
even in file storage. UTF-16 is reasonable for such internal use until you
come to the code points beyond 0xFFFF, but those points are little used
(well, that's the theory, and Bill Gates seems to have committed himself
to that theory).


The confusion on this issue probably stems from the fact that you can use
existing data viewers -- such as text editors -- to view the result of a
7-bit encoding and cannot use such "legacy" services for viewing UTF-8 or
UTF-16.

There are many editors that will display _something_ useful for any 8-bit
stuff. There are a smaller number that will actually show you and let you
edit the full Unicode characters.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5