perl-unicode

Re: Interpretation of non-UTF8 strings

2004-08-16 07:30:11
Marcin 'Qrczak' Kowalczyk wrote:

W liście z pon, 16-08-2004, godz. 16:54 +0300, Jarkko Hietaniemi
napisał:


The encoding pragma partially works. It doesn't influence assumed
encoding of files opened without specifying the encoding, nor handling
of filenames, and it needs to be told about the encoding literally.
How to say it should be taken from the locale?

The 'use open ":locale"' does not work for you?


[qrczak ~]$ echo ąćę >1
[qrczak ~]$ perl -e 'use open ":locale"; open F, "1"; print <F>'
ąćę
[qrczak ~]$ perl -Mencoding=latin2 -e 'use open ":locale"; open F, "1"; print 
<F>'
"\x{66a8a}" does not map to iso-8859-2 at -e line 1, <F> line 1.

You probably need

perl -Mencoding=latin2 -C -e 'use open ":locale"; open F, "1"; print <F>'

With Perl 5.8.1 in SunOS 8 with LC_ALL set to pl_PL.ISO8859-2 I get
"ąćę" (0xb1 0xe6 0xea) with that.  I have to admit that needing the -C
smells like a bug, I'll investigate.

Also, 'use encoding' must still have the encoding specified literally,
it can't be taken from the locale

True.  I can't now remember whether there was any particular technical
limitation why that wasn't implemented.  I'll look into it when I find
the time.

(I can live with that in the case of
the Kogut <-> Perl bridge, as I can just put the encoding there because
I determine it myself anyway), and neither fixes handling of filenames.

The handling of filenames will not get fixed any time soon unless
someone fixes it.  (Rather obvious, isn't it?)  I suggest anyone
thinking it's easy to read the archives of this list, it has been
discussed many times in the past.  The major problem from Perl's
viewpoint is that the "fixes" are different for each platform
(UNIX, Win32, Mac OS), but Perl of course should at least attempt to
present a unified Perl-level interface for all that.

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>