On Wed, 10 Jan 2007, Paul Bijnens wrote:
On 2007-01-10 08:10, John Costello wrote:
Is there a list of utf8 characters that perl cannot map, for example
"\xA0"? This is with Perl 5.8.3.
AFAIK there is no problem with "\xA0" if you mean the "\xA0" in
latin1 (iso8819-1) or similar encodings. That is just the "no-break
space".
Yes, that is the character I mean, though it is ISO-8859 (I seem to recall
that one is a subset of the other).
What exactly is your problem with that character?
perl 5.8.3 complains
utf8 "\xA0" does not map to Unicode
when the file is read. I'm specifying open(INFILE,
"<:encoding($this->{'encoding'})", $this->{filename}), where
$this->{'encoding'} is set to utf8 (confirmed that).
The file originally was generated by perl 5.6.1 with utf encoding
specified via binmode. The file then was tarred, gzipped, scp'd, and
ungzipped and untarred and fed to perl 5.8.3.
Thanks to Darren for the pointer to perldelta and the Unicode versions. I
see that Unicode 4.0.0 does support \xA0, as well as the 110 other
characters that perl 5.8.3 complains about.
If I drop the encoding statement and change the open command to
open(INFILE, "<$this->{'filename'}"
the errors disappear.
...
This leads me to think that perl 5.6.1 isn't encoding the output into
utf8, but that's a bit of a wild guess.