perl-unicode

Re: What to do with non-assigned points?

2002-03-18 04:46:43



use Encode qw(from_to); use strict;
my $s = join "", map { chr } 128..255;
for my $enc (qw( iso8859_3 ))
{
   from_to($s,$enc,"utf-8");
}

The problem is that iso-8859-3 does not assign characters to all octet
values.
What should Encode:: do in such cases:
  A. U+FFFD
  B. Map octet to Unicode/iso-8859-1
  C. Use a "private use" page...

If there is no function hanging on the 'illegal characters' hook, then it
should return U+FFFD. This is pretty clear from the standard, etc. The
other options don't bare thinking about since they change the mapping
definition of iso8859-3.

The fun starts on the return path :) Ideally an encoding should list a
default fallback Unicode fail character for mapping unmapped Unicode
values. I.e. the 8-bit equivalent of U+FFFD.

Martin Hosken


<Prev in Thread] Current Thread [Next in Thread>