perl-unicode

Re: Interpretation of non-UTF8 strings

2004-08-16 07:30:05

Python explicitly distinguishes byte strings and Unicode strings,
which allows the two models to coexist without ambiguity.

I think that (not doing) this was the basic failure of the Perl Unicode
model.  We made a valiant attempt at making them the same and allowing
old legacy code to work, and I think we got close, but the scheme could
carry us only so far.

Actually I thought that the Perl implementation of Unicode was pretty good, at least since 5.8.0... (5.6.x series was kind of broken IMHO, and there was no 'Encode'...).

If a string has the UTF-8 flag on, then you know that it's OK. Otherwise, you know that you need to use Encode to turn it into UTF-8. And of course to do that you need to know the string's character set.

Of course if you concatenate a non-utf8 string with a utf8 string and your locale is incorrectly set, then you run into trouble... I think it's a bit crazy to want Perl to do automagically the right thing anyway, so having to specify the character set on IO operations is fine by me...

And guys, thanks for Encode. Such a fine module :-)

<Prev in Thread] Current Thread [Next in Thread>