perl-unicode

Re: Unicode problems with mb2md-3.10 on Red Hat 8.0

2003-03-09 14:30:04

m(_dot_)fioretti(_at_)inwind(_dot_)it said:
1) (non UNICODE question, I think) to make the script start with the
Perl mentioned above, I had to replace "concat" with "open" in line
1251: what is concat? I *never* found it in my Perl manuals. Is "open"
equivalent?

Odd, when I downloaded this same script just now, there was no "concat" 
to be found anywhere in it.  Line 1251 was:

   open(OUT, ">$messagefn") or die("Fatal: unable to create new message 
$messagefn");

(Why would your mb2md-3.10.pl be different from mine? I got mine from 
the URL you cited.)

2) (Unicode question) The script gives tons of warning like these:

Malformed UTF-8 character (unexpected end of string) at ./
mb2md-3.10.pl line 1343, <MBOX> line 3009.

Malformed UTF-8 character (unexpected non-continuation byte 0x20,
immediately after start byte 0xc8) in pattern match (m//) at ./
mb2md-3.10.pl line 999, <MBOX> line  9706. 

The mbox file you're reading as input has some bytes with high-bit-set,
and these are not part of utf8 multi-byte characters.  On top of that,
there is something in your environment (locale setting or whatever) that
is inducing your perl 5.8.0 to assume that "use utf8" should be in
effect, and this setting is being applied to files opened for input.

I think other discussions on the perl-unicode list have determined that
this behavior is something that needs to be fixed in the 5.8.1 release
(i.e. file i/o should always be "binary" unless explicitly declared to
be otherwise via open(), binmode or PerlIO, but 5.8.0 breaks this).

In the meantime, you could try altering your copy of mb2md to add this
line (e.g. next to "use strict;"):

   no utf8;

and see if that allows your mbox file to be treated the way you (and 
the script author) expect.

        Dave Graff


<Prev in Thread] Current Thread [Next in Thread>