mhonarc-dev

UTF-8 filtering in MHonArc (was Re: lots of UTF-8 warnings)

2005-05-22 09:32:50
Jeff,

I've been doing some more work on the utf-8 code in CharEnt.pm, and
I believe I have a method for dealing with the malformed sequences
without generating warning messages.  Along with that, I'm trying to
make the code more robust.

I'm doing some testing, but there are some inconsistencies with
different versions of Perl.  I currently have Perl versions 5.6.1 and
5.8.0 installed, and noticed that unpack() behaves a little different
between the two.

I am downloading 5.8.6 to upgrade my 5.8.0 install to see if behavior
changes, along with playing with mhonarc code to see if I can get
some consistency and a better understanding of Perl's behavior.

With unicode-ware Perl (>=5.6), I am trying to convert malformed
utf-8 sequences to U+FFFD, the recommended practice for dealing with
malformed sequences.  Doing this for Perl <5.6 may require more work
since utf-8 handling is done with a homegrown method.  I may leave
that code as-is since I'm not sure it is worth the effort for older
non-utf-8-aware versions of Perl.  If users are concerned about
robust utf-8 support, they should use the latest version of Perl.

--ewh

P.S. Is it wise to suppress the malformed utf-8 warning messages?
If mhonarc can deal with the sequences in a proper manner, I'm not
sure if there is any value in printing the warnings.

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV

<Prev in Thread] Current Thread [Next in Thread>
  • UTF-8 filtering in MHonArc (was Re: lots of UTF-8 warnings), Earl Hood <=