Re: [approved] From field in ISO-2022-JP

2002-12-02 12:44:38
On December 2, 2002 at 10:57, Koichi Nakatani wrote:

You're right, and current MHonArc cannot treat this correctly.

Can someone provide me with a sample message that shows this problem?
I'd like to have it as a test case.

I think messages from Mr. Ogawa are broken, and I cannot see any
correct methods to handle incorrect messages.

According to RFC 2047, you have to use encoded-words to embed
non ASCII characters in message headers.

The message is legal.  As you noted in a later, it is the iso-2022-jp
encoding.  MHonArc's mail address detection does not consider encoding,
it works at the raw octet level.

Now, the newer MHonArc::CharEnt in CVS and in the snapshot builds
converts iso-2022-jp to Unicode character entity references.  When
using it, the mailto linking works as expected.

The from field:

  =?iso-2022-jp?B?GyRCPi5AbhsoQg==?= <hisaddress(_at_)example(_dot_)com>

is converted to the following:

  &#x5C0F;&#x5DDD;&lt;<a href="mailto:hisaddress(_at_)example(_dot_)com"

[line break added for readability]
I'm assuming the Unicode values are correct since I cannot read

Therefore, the question is, "Should MHonArc::CharEnt replace as the default converter for iso-2022-jp data?"
(I asked the question on the mhonarc-dev list, but since the
number of subscribers is small, I'll ask on this list.)

MHonArc::CharEnt is written in pure Perl and does not depend
on any non-standard modules, so it should work under any version
of Perl 5 (mainly versions <5.6.1).

Testing with Mozilla (via Galeon) on Linux and testing Mozilla and IE
6 on Windows, all browsers are able to load the proper font glyphs
(if installed -- which took me some time to find fonts for Windows
that I could install) for Unicode character entity references,
independent on what the actual document character set is.
I have not tested text browsers like w3m or lynx.


To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the