perl-unicode

Re: Interpretation of non-UTF8 strings

2004-08-16 07:30:09
Marcin 'Qrczak' Kowalczyk wrote:

W liście z pon, 16-08-2004, godz. 16:31 +0300, Jarkko Hietaniemi
napisał:


In summary, some parts of Perl treat non-UTF-8 scalars as ISO-8859-1,
while others treat is as whatever is expected by default in files and
filenames and commandline (the locale tells what it is). It should be
decided one way or the other, otherwise generic code doesn't know how to
interpret Perl scalars it encounters.

"generic code"?  If you mean Perl, you can use utf8::is_utf8().  If you
mean XS, you can use SvUTF8().


I mean XS. If SvUTF8 is false, I don't know whether to interpret the
contents as ISO-8859-1 or according to the locale.

True.  But if you know nothing of where the SVs are coming you would not
know it anyway, I think.  You cannot not know whether the bytes in the
SV are characters at all, but instead a binary pack() buffer or a vec()
bitvector, for example.

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>