Re: Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and g

"Bruce" == Bruce Lilly <blilly(_at_)erols(_dot_)com> writes:


 Bruce> No, please read the entire "Full Encoding" section of Unicode
 Bruce> 3.0 or earlier (it's only a few psragraphs).  The number of
 Bruce> code points (including Unicode 3.0 and earlier surrogate
 Bruce> pairs) has not changed.  What has changed is that the
 Bruce> surrogate pairs (somewhat analagous to shift codes in some
 Bruce> character sets) have been effectively converted to "native"
 Bruce> 32-bit codes.  And that changed Unicode from a uniform 16-bit
 Bruce> code set; look at the number of changes in 3.1 that altered
 Bruce> the "16-bit" text of earlier Unicode.

reading the Unicode docs, the change seems unsurprising given that
Unicode has not been what I would call a "uniform 16-bit code set"
(which I would define as one in which all characters were represented
with exactly 16 bits) for as long as it has had surrogate pairs.

As long as there were no characters assigned in the range above
U+FFFF, the Unicode spec could continue to ignore this discrepancy;
but the allocation of such codes obviously makes that impossible.

I suspect this is historical; Unicode seems to have been designed on
the assumption that 16 bits was enough, and that UTF-16 would be the
primary interchange format.

-- 
Andrew.

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: RFC 2047 and gatewaying, Charles Lindsey

Next by Date:

Re: Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)), Philip Hazel

Previous by Thread:

Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)), Bruce Lilly

Next by Thread:

Re: Unicode principles (Was Re: UTF-8 versions (was: Re: RFC 2047 and gatewaying)), Philip Hazel

Indexes:

[Date] [Thread] [Top] [All Lists]