perl-unicode

Problems with XML - What exactly does "Cannot decode string with wide characters" mean?

2002-11-12 01:30:07

Hi,

I'm not exactly an expert in the various encodings or the Perl internals
 on unicode, so I apologize in advance if this isn't exactly a well
polished question...

Anyway, here goes:

I've recently converted to Perl5.8, and thus migrated all of my Japanese
encoding conversion routines to Encode.pm. It mostly works like a charm
-- it's fast, and I like it, but somehow XML::LibXML doesn't seem to
like it. I'm trying to do this:

  1) parse an XML file (euc-jp encoding) with XML::libXML
  2) stuff the data from the XML into an euc-jp database.
  3) when doing this, I do

      utf82euc( $xml->findvalue( 'foobar' ) );

   where utf82euc() is a convenience function that I wrote which does:

      my $octets = decode( 'utf8', $text );
      return encode( 'euc-jp', $octets );

The problem is that when I call decode(), I get the error

    "Cannot decode string with wide characters"

I do the same thing in a bunch of different places, but this particular
location always always gives me the error, REGARDLESS of the string that
I put in that particular tag!

At this point I really don't know who to blame/ask, so I'd like to do
some more debugging on my own -- but I don't know what that error means.

What does that exactly mean? Does it mean XML::LibXML (or libxml2?) is
somehow producing bad utf8? What are the criteria for getting this error?

My first suspect was XML::LibXML, but I put some debug prints, and then
did a

    ./myscript | iconv -f utf8 -t euc-jp

....and iconv had no problem converting, so I'm really confused now.

TIA for any help...

--d