perl-unicode

Reading/writing non-Unicode files with perl5.8?

2003-01-14 02:30:04
I'm a longtime 5.005/5.6.1 user.  I recently upgraded my
Linux system to RH8.0 and got perl5.8 in the bargain.  I
have many perl scripts that read or write non-Unicode files,
mostly ANSI files.  Many of those scripts have broken,
seemingly because of Unicode-forcing behavior in perl5.8.

(It is possible that some other part of my system upgrade is
responsible, like maybe my shell; if anyone knows of some
kind of system-wide Unicode infestation that could be the
cause of these problems, please let me know!)


WRITING:
perl -e 'print pack("H6", "31a931")' > foo

This produces a file with four bytes: 31, c2, a9, 31,
whereas 5.6 would just write exactly the three bytes I
specified.  I have tried all manner of tricks but I just
cannot seem to write a file from perl containing just those
three bytes.  I understand the Unicode translation that is
happening here, I just don't want it!


READING:
perl -e '$c = <STDIN>; while ($c =~ m/./g) {print pos($c), "\n"}' < foo

(This requires a file 'foo' with exactly the three bytes I
listed above: 31, a9, 31)

Output:
1
Malformed UTF-8 character (unexpected continuation byte 0xa9,
with no preceding start byte) in match position at -e line 1,
<STDIN> line 1.
2
Malformed UTF-8 character (unexpected continuation byte 0xa9,
with no preceding start byte) in match position at -e line 1,
<STDIN> line 1.
3

In this case the "malformed UTF-8 character" messages don't
seem to be causing any harm, but they're certainly annoying,
and I have seen other cases (can provide if necessary) where
the script in fact behaves differently.

What I'm reading is not a UTF-8 file - it's an ANSI file!
Is there some way to tell perl to just read the bytes without
translation?


Many thanks in advance.
d.

<Prev in Thread] Current Thread [Next in Thread>