UCS-2 and UTF-16 [was Re: Encode, take five]


    Philip> On 12 Sep 2000, at 18:42, Jarkko Hietaniemi wrote:
    >> UTF-16 is also known as UCS-2, 16 bit or 2-byte chunks,

    Philip> As I understand it, that's not true -- UTF-16 is 2-byte *or*
    Philip> 4-byte chunks, since UTF-16 contains surrogates (high-surrogate +
    Philip> low- surrogate [or the other way around?] = 1 character,
    Philip> represented with four bytes). UCS-2, OTH, is always two bytes.

True, UTF-16 is not known as UCS-2.  However, UTF-16 still consists of 2-byte
chunks.  It is essentially UCS-2 plus high and low surrogates (see the Unicode
Standard 3.0 page 19).  Combining surrogates constitutes a UCS-4 encoding (or
UTF-32 until unavailable 10646 private use regions are removed).
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            Cinema, radio, television, magazines are a
New Mexico State University       school of inattention: people look without
Box 30001, Dept. 3CRL             seeing, listen without hearing.
Las Cruces, NM  88003                            -- Robert Bresson

<Prev in Thread]	Current Thread	[Next in Thread>
Encode, take five, Jarkko Hietaniemi Re: Encode, take five (malformed UTF-8), Markus Kuhn Re: Encode, take five (malformed UTF-8), Jarkko Hietaniemi Re: Encode, take five (malformed UTF-8), Jarkko Hietaniemi Re: Encode, take five, Jarkko Hietaniemi Re: Encode, take five, Jarkko Hietaniemi Re: Encode, take five, Nick Ing-Simmons Re: Encode, take five, Jarkko Hietaniemi Re: Encode, take five, Philip Newton Re: Encode, take five, Jarkko Hietaniemi UCS-2 and UTF-16 [was Re: Encode, take five], Mark Leisher <= Re: UCS-2 and UTF-16 [was Re: Encode, take five], Philip Newton Re: UCS-2 and UTF-16 [was Re: Encode, take five], Mark Leisher Re: Encode, take five, Matt Sergeant Re: Encode, take five, Philip Newton Re: Encode, take five, Ed Batutis

Previous by Date:	Re: Encode, take five, Philip Newton
Next by Date:	Re: Encode, take five, Matt Sergeant
Previous by Thread:	Re: Encode, take five, Jarkko Hietaniemi
Next by Thread:	Re: UCS-2 and UTF-16 [was Re: Encode, take five], Philip Newton
Indexes:	[Date] [Thread] [Top] [All Lists]