Re: Very general question about Unicode processing

Assuming that what I just wrote is right (if it isn't, please explain why)
both strategies work, in any Perl/locale setting combination, only on
some files, right? Whatever I do, if I fetch to a script a random combination
of files encoded each in a different mode, some of them will be
treated uncorrectly, won't they?


But surely, it's not a unicode problem, is it? I mean, even without
unicode a file might be encoded in Shift-JIS, iso-8859-1 or BIG5, and
you'd have no obvious, immediate way of telling...

I agree that the fact that files don't have standard metadata headers is
a terrible thing (like emails do), but what just can you do? I don't
think what you're describing is a unicode problem, it's more of an OS
issue. Wouldn't it be great if unix had built-in setprop (key, value) /
getprop (key) primitives for files?

Shouldn't a plain text file have written inside, hidden, at its beginning, 
what its
encoding is? In this way, whatever locale you set, a Perl script would
automagically treat each file correctly, right?


Doesn't perl automagically open a file as UTF-8 when it's got a BOM? I
think it was the case in 5.6.1 (unless my memory's failing), I don't
know about 5.8.

Yes, when every file stored in every PC or CDROM in the world will be
UNICODE the problem will disappear, but what should one do now?


Hack around :)

Inside your own projects if you name your files with consistent practice
<filename>.<encoding>.<extension> it's not too bad.

Cheers,
-- 
Building a better web - http://www.mkdoc.com/
---------------------------------------------
Jean-Michel Hiver
jhiver(_at_)mkdoc(_dot_)com  - +44 (0)114 255 8097
Homepage: http://www.webmatrix.net/