Re: Problems with Perl Asian encodings?

Hi,

Samuel L. Bayer wrote:

So the outcome was that there's a mode in GNU recode which will dropthese illegal first bytes. So the question is: is the same thingpossible in Perl Encode? The documentation for some of the FB_ variablesis tempting, but pretty opaque.

Yes, the way to do it is by using Encode::FB_QUIET. Basically, here'show you would do it... if $text is the text you want to decode intoUTF-8, then this should do the trick:


-----
use Encode;

my $textcopy = $text;
my $encoding = "gb2312";

my $decoded = decode($encoding, $text, Encode::FB_QUIET);

while ($text ne "") { # this loops while we've still got badcharacters to deal with.### my $badbyte = substr($text, 0, 1); # $badbyte now contains theinvalid byte.

  ### my $hex = sprintf("%X", ord($badbyte));

### print STDERR "Invalid character \\x" . ("0" x (1 - length($hex))). $hex . " in input - dropping.\n";

  $text = substr($text, 1);   # skip over the bad character
  $decoded .= decode($encoding, $text, Encode::FB_QUIET);
}

print "Output: $decoded\n";
-----

The code as given will ignore every bad character and prints nowarnings; if you want warnings, uncomment the lines marked with ###. Itdepends what you want your code to do. :D


Hope this helps!

 - Ciaran.