Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewa


Arnt Gulbrandsen wrote:

That change was basically an admission that 64k wasn't enough. It isstill possible that some bigger number is enough. The unicode consortiumbelieves that 17*64k is enough, and I agree.


No, please read the entire "Full Encoding" section of Unicode 3.0 or
earlier (it's only a few psragraphs).  The number of code
points (including Unicode 3.0 and earlier surrogate pairs) has not
changed.  What has changed is that the surrogate pairs (somewhat
analagous to shift codes in some character sets) have been effectively
converted to "native" 32-bit codes.  And that changed Unicode from a
uniform 16-bit code set; look at the number of changes in 3.1 that
altered the "16-bit" text of earlier Unicode.

"There are over 18,000 unassigned code positions that are availablefor future allocation. This number far exceeds anticipated characterencoding requirements for all world characters and symbols."
Yep. I have that too. The fact that 18,000 isn't enough doesn't meanthat about a million isn't enough.


Now you've conflated *unassigned* code points with *total* code
points.  And as noted above, that's 16-bit native code positions; the
text continues to point out that even early Unicode had over a million
code points available via surrogate pairs.

That doesn'tadd up to much either


A piece of straw doesn't weigh much -- ever hear of "the straw that broke
the camel's back"?
.

"Graphologies unrelated to text, such as musical and dance notations,are outside the scope of the Unicode Standard."
"Unrelated to text". If something like that is used in booksintermingled with English text, it's hard to say that it's unrelated toEnglish text.


Had the Unicode Consortium started with the premise that it was encoding glyphs
and that anything that had ever appeared on a display, on paper, carved in clay
or stone, or scratched in sand was fair game, that would be one thing. But it
didn't; it started with the principle of encoding text (not limited to English)
*characters*, not glyphs.

So we now have Western musical notes which had been deemed out of scope. How
long do you suppose it will be before some group that uses non-Latin text
characters pipes up and says "wait a minute, that's *Western* musical
notation; we use something different, so you have to include our notation
also"? So we'll have a handful of Asian musical notations, a bunch of Native
American musical notations, some Australian aboriginal musical notations,
and so on.  All unrelated to text.

Publishing in books shouldn't be the main criterion for whether something
is part of Unicode.  The criteria, first and formost, should be "is it text?".
Then "is it fundamentally different from what is already in Unicode, or it
it a sytlistic variant?"; e.g. different font styles or different han
sytles.  Musical notation never belonged in Unicode, as originally stated.
It does not fall within the scope of the issues that Unicode was intended
to address -- people don't need to look up strings of musical notes in a
disctionary, arrange them in a collating sequence, etc.

In any event, we're getting pretty far off topic here.

Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying))