perl-unicode

Very general question about Unicode processing

2003-04-16 15:30:05
Hello,

some time ago I asked for help on this list about how to make Perl
scripts work again with Perl 5.8 and the default locale settings in
Red Hat 8. In that occasion, and in all the other related post I've
found, the solutions fell in two categories, AFAICT:

1) set some environment variables so the script know that what's
   coming is encoded in this or that other way

2) modify the script making it open the file in a manner consistant
   with the file encoding. (byte/ no utf  pragmas, and such)

Assuming that what I just wrote is right (if it isn't, please explain why)
both strategies work, in any Perl/locale setting combination, only on
some files, right? Whatever I do, if I fetch to a script a random combination
of files encoded each in a different mode, some of them will be
treated uncorrectly, won't they?

Shouldn't a plain text file have written inside, hidden, at its beginning, what 
its
encoding is? In this way, whatever locale you set, a Perl script would
automagically treat each file correctly, right?

In other words, Unicode is good, but isn't it fundamentally wrong that
the responsibility to declare a *file*'s internal encoding must fall
outside it, ie on the locale setting, or the script that should
process it? How could this scale with files which pass through many
different scripts and OSes (eg dictionaries, fortune files, whatever?)

Yes, when every file stored in every PC or CDROM in the world will be
UNICODE the problem will disappear, but what should one do now?

What do you think? Have I indeed seen some maybe obvious but objective
truth, or have I just made a fool of myself by missing something?
(please be understanding and supportive in the second case...)

Thanks for your patience  

        Marco Fioretti
 
-- 
Marco Fioretti                 m.fioretti, at the server inwind.it
Red Hat for low memory         http://www.rule-project.org/en/

                                                       Mark Twain

<Prev in Thread] Current Thread [Next in Thread>