perl-unicode

RE: Warning messages for ill-formed data

2003-03-25 09:30:04
Hi all-
  I want to clarify what I was trying to say:

Use the optional 3rd argument to decode().

$utf8 = decode("Big5" => $big5); # ill-formed chars are 
mapped to U+FFFD
$utf8 = decode("Big5" => $big5, Encode::); # same but warnings 
issued

see "Handling Malformed Data" of "perldoc Encode" for how to use the 
3rd argument.

I don't think FB_WARN or FB_CROAK catch the type of malformed data
I was describing (upper ascii outside of the Big5 range).

If I understand correctly, though, SADAHIRO Tomoyuki and Dan Kogai
proposed correcting this by removing single-byte upper ascii characters
from 
the \x80-\xA0 range in the big5-eten map (and big5-hkscs).  Is this
correct?  If so, should the other GB and Big5 maps be checked so that
single-byte upper-ascii mappings can be removed in the same way?


after the patch (warned)
big5-eten "\x88" does not map to Unicode
 at D:/perl/bp581/lib/Encode.pm line 156.

The message is not 'big5-eten "\x88\x71" does not map to Unicode..',
of course (big5-eten.ucm does not define "\x88\x71"
as a double-byte char), that may be what is expected, though.

The 2-byte version ("\x88\x71") would be a more helpful warning to me.
Although in an earlier email you accurately pointed out that it may be
ambiguous what type of error exists in such a case, displaying the
subsequent
byte helps for both determining what the error is and locating it in the
original
text.  Additionally, it would be helpful to specify the text source
(i.e. file name)
in the warning message, if possible.

Mark



<Prev in Thread] Current Thread [Next in Thread>