Philip> On 12 Sep 2000, at 18:42, Jarkko Hietaniemi wrote:
>> UTF-16 is also known as UCS-2, 16 bit or 2-byte chunks,
Philip> As I understand it, that's not true -- UTF-16 is 2-byte *or*
Philip> 4-byte chunks, since UTF-16 contains surrogates (high-surrogate +
Philip> low- surrogate [or the other way around?] = 1 character,
Philip> represented with four bytes). UCS-2, OTH, is always two bytes.
True, UTF-16 is not known as UCS-2. However, UTF-16 still consists of 2-byte
chunks. It is essentially UCS-2 plus high and low surrogates (see the Unicode
Standard 3.0 page 19). Combining surrogates constitutes a UCS-4 encoding (or
UTF-32 until unavailable 10646 private use regions are removed).
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab Cinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces, NM 88003 -- Robert Bresson