perl-unicode

AL32UTF8

2004-04-29 10:30:08
Am I right in thinking that perl's internal utf8 representation
represents surrogates as a single (4 byte) code point and not as
two separate code points?

This is the form that Oracle call AL32UTF8.

What would be the effect of setting SvUTF8_on(sv) on a valid utf8
byte string that used surrogates? Would there be problems?
(For example, a string returned from Oracle when using the UTF8
character set instead of the newer AL32UTF8 one.)

Tim.

<Prev in Thread] Current Thread [Next in Thread>