ietf-822
[Top] [All Lists]

Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)

2003-01-12 11:20:43

The below basically argues that Unicode (and UTF-8) is big enough for the foreseeable future, so that no change to the UTF-8 specification will be necessary.

If you don't care about Unicode, just stop reading here.

Bruce Lilly writes:
Arnt Gulbrandsen wrote:
Why would you expect Unicode to change substantively?

The 3.0->3.1 experience. A.k.a. "once burned, twice shy".

That change was basically an admission that 64k wasn't enough. It is still possible that some bigger number is enough. The unicode consortium believes that 17*64k is enough, and I agree.

The number of characters used for human communication desn't seem to be rising much, and there's plenty of space left in the current specification. IIRC Unicode still uses less than 200,000 of the million-odd possible code points.

Famous last words. From my handy dead-tree copy of Unicode 2.0, page 2-4, under the "Full Encoding heading":

"There are over 18,000 unassigned code positions that are available for future allocation. This number far exceeds anticipated character encoding requirements for all world characters and symbols."

Yep. I have that too. The fact that 18,000 isn't enough doesn't mean that about a million isn't enough.

Cough, cough. It is nearly a universal truth that things tend to expand to fill the available space (and/or time). Why do you (apparently) think that Unicode is exempt?

I don't. I do think that people's ability/willingness to learn characters is a (much) stricter limitation than the number of available code points in Unicode.

Some people will invent new scripts for some languages, but I doubt _many_ characters will be added in this way. The costs of teaching kids big alphabets are too high, for a start.

Some people will take books which mix a "dance notation" font with English, write up a proposal adding those characters, and submit it. Or the chess notation used in the newspaper's chess column. That doesn't add up to much either - the number of characters added in that way is limited to what the font vendors and publishers use, and what the audience(s) will learn.

I suppose you could argue that Unicode adds alphabets. But do you think Unicode still hasn't reached the 20% mark?

They add more than "alphabets", and that's part of the problem. Again quoting Unicode 2.0 (page 1-3 this time):

"Graphologies unrelated to text, such as musical and dance notations, are outside the scope of the Unicode Standard."

"Unrelated to text". If something like that is used in books intermingled with English text, it's hard to say that it's unrelated to English text.

--Arnt

<Prev in Thread] Current Thread [Next in Thread>