perl-unicode

Re: is it utf8 or unicode?

2005-03-16 03:38:15
On Mon, Mar 14, 2005 at 12:14:12PM +0000, 
unicode(_at_)ftumsh(_dot_)demon(_dot_)co(_dot_)uk wrote:

Here's the problem:
I have the data in a db, it is utf-8 encoded so I get it into perl
as \xC3\x84. I turn on the utf-8 flag and then output it as xml
using the module XML::LibXML. The module XML::LibXML has two output
methods, toFH and toString.
If I generate xml using the above data and with an encoding of utf-8,
I get two different files. One is correct (using toFH) the other
isn't (it contains xC4, invalid utf-8).
toFH does not use perl's IO, toString does.
I thought, at first, that the module may be incorrect, however,
when the xml created by toString is parsed in memory, it passes ok.
ie the error occurs during the output. Which means the module is ok.

We (at work) think that the module is buggy but we are yet to formally
report it. Specifically its XS code is not checking the internal UTF8
flag before doing things with the PV.

Nicholas Clark

<Prev in Thread] Current Thread [Next in Thread>