Re: How to avoid auto-linking in non-ascii URLs

2006-03-22 11:09:52
On March 23, 2006 at 01:36, Masao Takaku wrote:

MHonArc outputs links of URL-like strings automatically.
When a message includes a string "See";,
MHonARC process this as follows;

See <a href="";>

It works well, but in case of an URL-like string followed by non-ASCII
text without space, this feature is not usefull;
e.g. "を見て.", which means
"See"; in Japanese, goes to as follows:

<a href=";&#x898B;&#x3066";>http://www.e;&#x898B;&#x3066</a>;.

In this example, the outputs should be like the following:

<a href="";></a>

My environment is Perl-5.8.0 and MHonArc-2.6.15 (default setting).

Does anyone know how to do this, or any workarounds?

First, you may want to check out <> for
Japanese-specific usage information MHonArc.  There should also
be links to a Japanese-based mailing list which may be useful.

As for your specific problem, you may need disable URL linking.
This can be done by specify -nourl on the command-line or
<NOURL> in your resource file.  The '&' is a legal URL character,
and MHonArc does not try to interpret what character entity reference
values resolve to to determine if it should be included.

The URL linking code is a single regex operation.

I'm not sure at this time on what code changes could be done.
If you go with ISO-2022-JP encoding for your archives, it may
avoid this problem.