Re: Encode, take five

On Wed, 13 Sep 2000, Matt Sergeant wrote:

Until someone extends the Unicode character set beyond the current range,


This has "already" happened. Have a look at
http://www.unicode.org/unicode/alloc/Pipeline.html , the Unicode
allocation pipeline of proposed new characters and scripts. It lists
quite a few scripts beyond U+FFFF, several of which are "Accepted" by the
Unicode Technical Committee (some as much as three years ago) and some in
various stages of the ISO pipeline. While it may take a while before these
become canon (and some may get thrown out along the way), it's not as if
everything after U+FFFF is empty as far as the eye can see, with nothing
on the horizon.

UCS-2 and UTF-16 currently have a one to one mapping. I assume thats the
point being made. An excerpt from the book I'm currently tech reviewing:

  Nonetheless Unicode does provide a means of representing code points
  beyond 64,535 by recognizing certain two-byte sequences as half of a
  surrogate pair. A Unicode document that uses UCS-2 plus surrogate
  pairs is said to be in the UTF-16 encoding. Since no software
  currently supports or produces surrogate pairs, and since no scripts


I'll grant you that, especially considering the sort of things proposed
for Plane 1 (Deseret Alphabet, Musical Symbols), etc. And, of course, not
many people will be taking advantage of these code points since they're
not finalised.

  are encoded in Unicode with code points above 65,535 the
  distinction between UCS-2 and UTF-16 is mostly academic at this
  point in time.


At this point in time, yes. I suppose I just wanted to point out that this
*may* change, at some unspecified (and maybe even distant) point in the
future.

Cheers,
Philip
-- 
Philip Newton <newton(_at_)newton(_dot_)digitalspace(_dot_)net>

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Encode, take five, (continued) Re: Encode, take five, Jarkko Hietaniemi Re: Encode, take five, Jarkko Hietaniemi Re: Encode, take five, Nick Ing-Simmons Re: Encode, take five, Jarkko Hietaniemi Re: Encode, take five, Philip Newton Re: Encode, take five, Jarkko Hietaniemi UCS-2 and UTF-16 [was Re: Encode, take five], Mark Leisher Re: UCS-2 and UTF-16 [was Re: Encode, take five], Philip Newton Re: UCS-2 and UTF-16 [was Re: Encode, take five], Mark Leisher Re: Encode, take five, Matt Sergeant Re: Encode, take five, Philip Newton <= Re: Encode, take five, Ed Batutis