perl-unicode
|
AL32UTF82004-04-29 10:30:08Am I right in thinking that perl's internal utf8 representation represents surrogates as a single (4 byte) code point and not as two separate code points? This is the form that Oracle call AL32UTF8. What would be the effect of setting SvUTF8_on(sv) on a valid utf8 byte string that used surrogates? Would there be problems? (For example, a string returned from Oracle when using the UTF8 character set instead of the newer AL32UTF8 one.) Tim.
|
|