Re: Interpretation of non-UTF8 strings

Python explicitly distinguishes byte strings and Unicode strings,
which allows the two models to coexist without ambiguity.


I think that (not doing) this was the basic failure of the Perl Unicode
model.  We made a valiant attempt at making them the same and allowing
old legacy code to work, and I think we got close, but the scheme could
carry us only so far.

Actually I thought that the Perl implementation of Unicode was prettygood, at least since 5.8.0... (5.6.x series was kind of broken IMHO, andthere was no 'Encode'...).

If a string has the UTF-8 flag on, then you know that it's OK.Otherwise, you know that you need to use Encode to turn it into UTF-8.And of course to do that you need to know the string's character set.

Of course if you concatenate a non-utf8 string with a utf8 string andyour locale is incorrectly set, then you run into trouble... I thinkit's a bit crazy to want Perl to do automagically the right thinganyway, so having to specify the character set on IO operations is fineby me...


And guys, thanks for Encode. Such a fine module :-)

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:	Re: Interpretation of non-UTF8 strings, Jarkko Hietaniemi
Next by Date:	Re: Interpretation of non-UTF8 strings, Jarkko Hietaniemi
Previous by Thread:	Re: Interpretation of non-UTF8 strings, Marcin 'Qrczak' Kowalczyk
Next by Thread:	Re: Interpretation of non-UTF8 strings, Jarkko Hietaniemi
Indexes:	[Date] [Thread] [Top] [All Lists]