perl-unicode

Re: Interpretation of non-UTF8 strings

2004-08-16 07:30:04
It only works for UTF-8, not for other encodings (perl-5.8.4):

As documented and intended.

$ perl -C -e 'print chr(0x104), "\n"'
Wide character in print at -e line 1.
[gibberish output, UTF-8-reinterpreted-as-ISO-8859-2]

The default encoding (of the locale and of the terminal) is ISO-8859-2,
which *is* capable of representing U+0104.

perl -Mencoding=latin2 -e 'print chr(0x104), "\n"'

gives you what you want.

In summary, some parts of Perl treat non-UTF-8 scalars as ISO-8859-1,
while others treat is as whatever is expected by default in files and
filenames and commandline (the locale tells what it is). It should be
decided one way or the other, otherwise generic code doesn't know how to
interpret Perl scalars it encounters.

"generic code"?  If you mean Perl, you can use utf8::is_utf8().  If you
mean XS, you can use SvUTF8().

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>