"Bruce" == Bruce Lilly <blilly(_at_)erols(_dot_)com> writes:
Bruce> No, please read the entire "Full Encoding" section of Unicode
Bruce> 3.0 or earlier (it's only a few psragraphs). The number of
Bruce> code points (including Unicode 3.0 and earlier surrogate
Bruce> pairs) has not changed. What has changed is that the
Bruce> surrogate pairs (somewhat analagous to shift codes in some
Bruce> character sets) have been effectively converted to "native"
Bruce> 32-bit codes. And that changed Unicode from a uniform 16-bit
Bruce> code set; look at the number of changes in 3.1 that altered
Bruce> the "16-bit" text of earlier Unicode.
reading the Unicode docs, the change seems unsurprising given that
Unicode has not been what I would call a "uniform 16-bit code set"
(which I would define as one in which all characters were represented
with exactly 16 bits) for as long as it has had surrogate pairs.
As long as there were no characters assigned in the range above
U+FFFF, the Unicode spec could continue to ignore this discrepancy;
but the allocation of such codes obviously makes that impossible.
I suspect this is historical; Unicode seems to have been designed on
the assumption that 16 bits was enough, and that UTF-16 would be the
primary interchange format.
--
Andrew.