Re: UTF-8 filtering in MHonArc

2005-05-22 17:32:42
On May 22, 2005 at 11:59, Jeff Breidenbach wrote:

I am downloading 5.8.6 to upgrade my 5.8.0 install to see if behavior
changes, along with playing with mhonarc code to see if I can get
some consistency and a better understanding of Perl's behavior.

Please let me know how that goes. FYI, Debian Sarge ships with perl
5.8.4, but it isn't too hard for us to track some other version of
perl, including perl 6.

Have not messed with perl 6.  I do not think there will be any
differences between 5.8.4 and 5.8.6.  The difference I saw were
between 5.6.1 and 5.8.x.  I discovered that I had to enable
warnings (via the warnings pragma) to get perl 5.8 to be more
strict about utf-8 decoding.  5.6.1 was strict by default.

There is also some other minor subtleties, which I belive I have
dealt with.

Is it wise to suppress the malformed utf-8 warning messages?  If
mhonarc can deal with the sequences in a proper manner, I'm not sure
if there is any value in printing the warnings.

Good question. I generally don't like to have common-but-harmless 
warnings in log files, lest they obscure more serious problems.
But I'm also quite capable of doing my own filtering. For example,

I checked into CVS a new that suppressed the warning
messages, along with converted malformed sequences into "�",
according to Unicode recommendations.

I was even able to beef up the <=5.006 code to be more robust about
malformed sequences.

If you want to try it out, grab the next snapshot build, or you
can download the file from the newly added CVS source browsing at

I currently filter out mhonarc's unrecognized character set warnings, 
since they are common and there's not much I can do about them.

See the CHARSETALIASES resource.


To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the

<Prev in Thread] Current Thread [Next in Thread>