perl-unicode

Re: List of unsupported unicode characters?

2007-01-10 11:59:47
On Wed, 10 Jan 2007, Paul Bijnens wrote:
On 2007-01-10 08:10, John Costello wrote:
Is there a list of utf8 characters that perl cannot map, for example 
"\xA0"?  This is with Perl 5.8.3.

AFAIK there is no problem with "\xA0" if you mean the "\xA0" in
latin1 (iso8819-1) or similar encodings.  That is just the "no-break
space".

Yes, that is the character I mean, though it is ISO-8859 (I seem to recall 
that one is a subset of the other).

What exactly is your problem with that character?

perl 5.8.3 complains 

        utf8 "\xA0" does not map to Unicode

when the file is read.  I'm specifying open(INFILE, 
"<:encoding($this->{'encoding'})", $this->{filename}), where 
$this->{'encoding'} is set to utf8 (confirmed that).
 
The file originally was generated by perl 5.6.1 with utf encoding
specified via binmode.  The file then was tarred, gzipped, scp'd, and
ungzipped and untarred and fed to perl 5.8.3.

Thanks to Darren for the pointer to perldelta and the Unicode versions.  I 
see that Unicode 4.0.0 does support \xA0, as well as the 110 other 
characters that perl 5.8.3 complains about.

If I drop the encoding statement and change the open command to
        open(INFILE, "<$this->{'filename'}"

the errors disappear.

...

This leads me to think that perl 5.6.1 isn't encoding the output into 
utf8, but that's a bit of a wild guess.