perl-unicode

Do we have ONE single page on Perl/UTF-8 problems?

2003-03-22 02:30:04
Hello,

some days ago, I asked here for help on running the mb2md script on
Red Hat 8. Eventually, setting LANG=en_US in the shell before running
the scripts fixed (hide?) the problem.

For the record, almost the same thing happens on Red Hat 8.0 with the
attached script, coming straight from the Perl Cookbook ftp page, at
www.ora.com . It gives a tree view of the output of the du
command. On a standard xterm in RH8 it also gives a bunch of these
errors:

Malformed UTF-8 character (unexpected end of string) at
./SOFT_TMP/cookbook.examples/ch05/dutree line 17, <> line 773.
Malformed UTF-8 character (unexpected end of string) at
./SOFT_TMP/cookbook.examples/ch05/dutree line 19, <> line 773.

export LANG=en_US solves again the problem.

After these two simple test cases, the actual question: 

Moving to Unicode/UTF-8 is a very good thing.
In spite of this, it cannot be denied that {old, 3rd party}
Perl scripts, working on {old, randomly encoded} text files break
when run in the default Perl/shell environment of Red Hat 8.0, and, I
guess, the same will happen in other distros as they move to Unicode.

As far as I know, all the fixes suggested so far work only in some
scripts, or are false (ie, as above, recreating a pre-utf8 environment
around the script)

Is there one single Perl page specifying:

        what must be changed in scripts so that one does NOT need to
        alter variables  before and after perl things, maybe
        one-liners

        what kind of shell wrappers one must use when there is no
        possible solution of the kind above

If there is no such page, why not? (I assume it doesn't exist because
on my first posting here nobody told me to go read this or that)

        "Do this and this to the scripts to make them work again"

        "This and that specific behaviors are bugs in Perl, and we
        have to wait that they are fixed"

        "This and that specific behaviors mean that the *script* is
        hopelessly broken, and should be rewritten (ditto for specific
        perl modules)"

Any feedback is welcome!

        Ciao,
                Marco Fioretti

-- 
Marco Fioretti                 m.fioretti, at the server inwind.it
Red Hat for low memory         http://www.rule-project.org/en/

We are drowning in information but starved for knowledge.
                                      -- John Naisbitt, Megatrends

<Prev in Thread] Current Thread [Next in Thread>