procmail
[Top] [All Lists]

Re: Filter for Japanese double-byte characters?

1999-10-07 06:29:34
Dick,

In a message dated Thu, 07 Oct 1999 01:37:02 -0700 (PDT), you wrote:
OK, I just checked the full headers again for a bunch of posts from the
honyaku list, and "text/plain; charset=ISO-2022-JP" appears in none of
them.
8<- snip 8<-
The two lists (Honyaku and JAT-LIST) are lists for E-to-J and J-to-E
translators both Japanese and non-Japanese, many residing outside of
Japan.  The posts are mostly in English, but many contain some
Japanese text (a lot of questions and answers about "How do I
translate xxxx", where xxxx is in Japanese.  I want a recipe that
will bounce these posts with Japanese to an address where I can read
them.
Currently I'm using Yahoo!'s email service, which works well with
Japanese (I use IE5 with the necessary Japanese add-ons).

OK, I can see your point now.  Is "honyaku" list hosted by onelist.com?
If this is the case, it's your specific problem.  As a member of another
community of onelist.com hosted mailing list, I can point out some
problem of those lists.

As you may know, members can post messages to the list not only by
sending mail but also by using online form.  I searched the archived
messages of the "honyaku" list and found that the individual message's
character encoding was not consistent. I found "Shift_JIS" (aka MS-Kanji)
encoded messages and "ISO-2022-JP" (aka JIS Kanji) messages. There may
be "EUC-JP" (extended unix code) encoded messages in the archive.  I
suspect that messages posted by online form have been delivered without
having correct "Content-Type:" field in the message header at all.
Oh well... :-(

In this case, just detecting ISO-2022-JP encoded message by searching
"[ESC]$B" pattern in the message body may not enough.  You may have to
auto-detect Japanese character encoding by searching unique patterns
in the message body; as most Japanese text editors and web browsers do.

# Better yet, you should ask onelist.com to fix this problem first?

Good luck,
________________________________________________
Satoru "Sam" MANITA - Saitama JAPAN
aka <satoru(_at_)manita(_dot_)com>