Re: AL32UTF8

On Thu, Apr 29, 2004 at 09:23:45PM +0300, Jarkko Hietaniemi wrote:
: Tim Bunce wrote:
: 
: > Am I right in thinking that perl's internal utf8 representation
: > represents surrogates as a single (4 byte) code point and not as
: > two separate code points?
: 
: Mmmh.  Right and wrong... as a single code point, yes, since the real
: UTF-8 doesn't do surrogates which are only a UTF-16 thing.  4 bytes, no,
: 3 bytes.

No, Tim's right--they're four bytes.  It's only the individual
surrogates that would come out to three bytes.  The break between
three and four bytes is between \x{ffff} and \x{10000}.

Larry

<Prev in Thread]	Current Thread	[Next in Thread>
AL32UTF8, Tim Bunce Re: AL32UTF8, Jarkko Hietaniemi Re: AL32UTF8, Brian Stell Re: AL32UTF8, Larry Wall <= Re: AL32UTF8, Tim Bunce Re: AL32UTF8, Jarkko Hietaniemi Re: AL32UTF8, Tim Bunce Re: AL32UTF8, Martin Hosken Re: AL32UTF8, Tim Bunce Re: AL32UTF8, Lincoln A. Baxter Re: AL32UTF8, Tim Bunce

Previous by Date:	Re: AL32UTF8, Brian Stell
Next by Date:	Re: AL32UTF8, Lincoln A. Baxter
Previous by Thread:	Re: AL32UTF8, Brian Stell
Next by Thread:	Re: AL32UTF8, Tim Bunce
Indexes:	[Date] [Thread] [Top] [All Lists]