Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)
2003-01-12 11:20:43
The below basically argues that Unicode (and UTF-8) is big enough for
the foreseeable future, so that no change to the UTF-8 specification
will be necessary.
If you don't care about Unicode, just stop reading here.
Bruce Lilly writes:
Arnt Gulbrandsen wrote:
Why would you expect Unicode to change substantively?
The 3.0->3.1 experience. A.k.a. "once burned, twice shy".
That change was basically an admission that 64k wasn't enough. It is
still possible that some bigger number is enough. The unicode
consortium believes that 17*64k is enough, and I agree.
The number of characters used for human communication desn't seem to
be rising much, and there's plenty of space left in the current
specification. IIRC Unicode still uses less than 200,000 of the
million-odd possible code points.
Famous last words. From my handy dead-tree copy of Unicode 2.0, page
2-4, under the "Full Encoding heading":
"There are over 18,000 unassigned code positions that are available
for future allocation. This number far exceeds anticipated character
encoding requirements for all world characters and symbols."
Yep. I have that too. The fact that 18,000 isn't enough doesn't mean
that about a million isn't enough.
Cough, cough. It is nearly a universal truth that things tend to
expand to fill the available space (and/or time). Why do you
(apparently) think that Unicode is exempt?
I don't. I do think that people's ability/willingness to learn
characters is a (much) stricter limitation than the number of available
code points in Unicode.
Some people will invent new scripts for some languages, but I doubt
_many_ characters will be added in this way. The costs of teaching kids
big alphabets are too high, for a start.
Some people will take books which mix a "dance notation" font with
English, write up a proposal adding those characters, and submit it. Or
the chess notation used in the newspaper's chess column. That doesn't
add up to much either - the number of characters added in that way is
limited to what the font vendors and publishers use, and what the
audience(s) will learn.
I suppose you could argue that Unicode adds alphabets. But do you
think Unicode still hasn't reached the 20% mark?
They add more than "alphabets", and that's part of the problem. Again
quoting Unicode 2.0 (page 1-3 this time):
"Graphologies unrelated to text, such as musical and dance notations,
are outside the scope of the Unicode Standard."
"Unrelated to text". If something like that is used in books
intermingled with English text, it's hard to say that it's unrelated to
English text.
--Arnt
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), (continued)
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Bruce Lilly
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Andrew Gierth
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Bruce Lilly
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Andrew Gierth
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Bruce Lilly
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Arnt Gulbrandsen
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Bruce Lilly
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Bruce Lilly
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying),
Arnt Gulbrandsen <=
- Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)), Bruce Lilly
- Re: Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)), Andrew Gierth
- Re: Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)), Philip Hazel
- Re: Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)), Bruce Lilly
- Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying), Charles Lindsey
- Re: RFC 2047 and gatewaying, Charles Lindsey
- Re: RFC 2047 and gatewaying, Bruce Lilly
- Re: RFC 2047 and gatewaying, Charles Lindsey
- Re: RFC 2047 and gatewaying, Bruce Lilly
- Re: RFC 2047 and gatewaying, Charles Lindsey
|
|
|