perl-unicode

Re: AL32UTF8

2004-04-30 02:30:05
On Thu, 2004-04-29 at 11:16, Tim Bunce wrote:
Am I right in thinking that perl's internal utf8 representation
represents surrogates as a single (4 byte) code point and not as
two separate code points?

This is the form that Oracle call AL32UTF8.

What would be the effect of setting SvUTF8_on(sv) on a valid utf8
byte string that used surrogates? Would there be problems?
(For example, a string returned from Oracle when using the UTF8
character set instead of the newer AL32UTF8 one.)

I think it makes no difference. (at least I could no find one), except
for the internal storage.  Several of the tests I wrote print a sql
DUMP(nch), and you can see the difference in the internal store in those
prints.  The strings come back to the client, the way they were put in.

I have tested this with 4 databases

dbcharset/ncharset
--------- --------
us7ascii/utf8
us7ascii/all6utf16
utf8    /utf8
utf8    /al16utf16

All tests produce the same results with all databases using both .UTF8
and .AL32UTF8 in NLS_LANG.

Lincoln


<Prev in Thread] Current Thread [Next in Thread>