Re: MHonArc and multi-byte characters in HTML

Hmm..., what I want to know is just if converting to
EUC(-JP) or UTF-8 solve MHonArc's problem (e.g.,
<TITLE>$SUBJECTNA:72$</TITLE>) or not.

In other words, I just want to know which problems will be
solved by your patch and which problems will not.

That's all.
(And I believe we should make this clear before applying the
patch to the MHonArc.)



Anyway,

From: Koichi Nakatani <nakatani(_at_)konica(_dot_)co(_dot_)jp>
Subject: Re: MHonArc and multi-byte characters in HTML
Date: Tue, 02 Oct 2001 08:50:36 +0900

Instead of answering your question, I would like to ask you a question.
What should be the correct "charset" parameter of HTML files generated
by MHonArc?

Sorry but I don't understand what you mean.
More precisely, I don't understand the relation between my
question and yours.
In fact, what I'm talking about is NOT the charset of HTML
messages but how treat (process) multi-byte characters in
MHonArc.


Possibly I have a misunderstanding about `chimera' state
(I'm sure this does not mean the name of WWW browser ;-),

If you want to understand what I mean, you should understand the relationship
between charset of HTML messages and how to treat multi-byte characters in
MHonArc.
  HTML generators like MHonArc are responsible to provide a mean to avoid
character encoding chimera state in HTML files.


Why?
The charset in original (RFC822) message does not work?
Of course I know few browsers support, for example,
iso-2022-jp-2 and converting to UTF-8 may help in such
situation.

But I cannot understand why converting to EUC-JP avoids
character encoding chimera state in HTML files.

Or your `chimera' state means the following situation?
| Subject: =?iso-8859-1?Q?......?=
| Content-Type: Text/Plain; charset=iso-2022-jp

You know, converting to EUC-JP doesn't help in such cases,
either.

  Practically, that means MHonArc is responsible to provide a mean to
generate UTF-8 files on user's choice.


I agree, and I've NEVER negated this, eh?

But I still don't understand the relationship between
charset of HTML messages and how to treat multi-byte
characters in MHonArc.


What I want to say is, if we want MHonArc to process
multi-byte characters like iso-2022-jp{,-2} (including
UTF-8) correctly, we need another functionality, for
example, by enhancing lib/iso2022jp.pl.

To put it concretely, I think we need some fuctions like
splitting multi-byte char strings appropriately etc.
(and I've been planning to write such codes for a long time
but don't have enough time...).

-- 
Takashi P.KATOH