perl-unicode

Re: Interpretation of non-UTF8 strings

2004-08-16 08:30:07
Nick Ing-Simmons wrote:

Marcin 'Qrczak' Kowalczyk <qrczak(_at_)knm(_dot_)org(_dot_)pl> writes:

But there is a simple workaround for that, as perluniintro would tell
you: the encoding pragma.

The encoding pragma partially works. It doesn't influence assumed
encoding of files opened without specifying the encoding, nor handling
of filenames, and it needs to be told about the encoding literally.
How to say it should be taken from the locale?


Once we had 

use encoding qw(locale);

But it did not work well as not all locale implementations
give the API to return the encoding.  
(And even en_GB can be in ASCII, 8859-1, 8859-15 (with euro), UTF-8, ...)

True.

For the open :locale I opted for a easy (cheesy?) algorithm:
(1) if we have langinfo(), use the return value of langinfo(CODESET).
(2) if we do not have getlanginfo(), look at %ENV for locale variables
    and look at the part after the dot, and use that value.
(3) Use the value from either (1) or (2) and if Encode recognizes that,
    good.  Otherwise give up.

Or something like that.  (It's documented in the open pragma, somewhere).


-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>