perl-unicode

encoding problems

2004-01-28 11:30:04
Hi,

I'm using MSXML 4 in combination with Perl 5.8.0. I'm having an XML file
with some
unicode in it.
Parsing the file goes OK. When trying to read a textnode with unicode
something strange happens: when I open the XML file in a textviewer i can
clearly see 3 separate characters. When counting the characters in the
variable where i put the node I only have 1 character left. Reading this
value in some unicode aware editor gives me the wrong representation.
The file is UTF-8 coded, according to the xml declaration, viewing the file
in
IE looks good.
This is what i do:
$tmp = $node->selectSingleNode('ID')->{'text'}      # puts the nodeValue in
$tmp
if ($node->selectSingleNode('ID[.="' . $tmp .'"]') {
# should return true,
# but returns false, because $tmp and the nodeValue aren't the same anymore
}

The character I trying to read is unicode character U+2248, UTF-8 encoded as
0xe2 0x89 0x88, the character MSXML/Perl returns me is just a single
character 0x98 (a tilde)
anybody any idea??

Hans


<Prev in Thread] Current Thread [Next in Thread>
  • encoding problems, Hans Scholte <=