Hi,
I'm not exactly an expert in the various encodings or the Perl internals
on unicode, so I apologize in advance if this isn't exactly a well
polished question...
Anyway, here goes:
I've recently converted to Perl5.8, and thus migrated all of my Japanese
encoding conversion routines to Encode.pm. It mostly works like a charm
-- it's fast, and I like it, but somehow XML::LibXML doesn't seem to
like it. I'm trying to do this:
1) parse an XML file (euc-jp encoding) with XML::libXML
2) stuff the data from the XML into an euc-jp database.
3) when doing this, I do
utf82euc( $xml->findvalue( 'foobar' ) );
where utf82euc() is a convenience function that I wrote which does:
my $octets = decode( 'utf8', $text );
return encode( 'euc-jp', $octets );
The problem is that when I call decode(), I get the error
"Cannot decode string with wide characters"
I do the same thing in a bunch of different places, but this particular
location always always gives me the error, REGARDLESS of the string that
I put in that particular tag!
At this point I really don't know who to blame/ask, so I'd like to do
some more debugging on my own -- but I don't know what that error means.
What does that exactly mean? Does it mean XML::LibXML (or libxml2?) is
somehow producing bad utf8? What are the criteria for getting this error?
My first suspect was XML::LibXML, but I put some debug prints, and then
did a
./myscript | iconv -f utf8 -t euc-jp
....and iconv had no problem converting, so I'm really confused now.
TIA for any help...
--d