Re[2]: Re[2]: postprocessing filter for archive files

1999-12-14 19:57:02

I'm interested in the Japanese mail processing part of your post. 
Do you use Unix?
Linux for the mail, web, etc services.

Does MHonarc process JIS correctly?  I thought it doesn't (on Unix).
I dont know if it is strictly speaking "correct".
I looked at the generated HTML files directly and delivered via 
HTTP (apache), received with Netscape and Internet Explorer.
I believe I saw proper JIS, but not tagged as such.

Internet Explorer 4.0 mostly failed to see it, 
Netscape 4.x got it right with "Japanese autodetect",
so I think their behaviour is adequate considering what they get.

Does ht:/dig process EUC correctly?  I thought it doesn't.

Strictly speaking it is not supposed to,
but roughly, it does a reasonable job anyway.
We did not do anything special to the ht:/dig setup,
except make sure that all our HTML files are EUC.
You can try this anywhere at
Is this different from your expectation?

For Japanese code conversion, I use this sort of commands as a 
preprocess instead of postprocess. 
  nkf -e -m | mhonarc .....
as you may know "-e" for EUC conversion, "-m" for MIME header decoding. 
(Although "-m" may be treated by current mhonarc internally, I use 
nkf -m since long time.)

How simple. Interesting. Didnt even think about this. Thank you!

Of course, my command does not add <META> tags, which have to be 
handled as a postprocess. 

I believe this can be configured with mhonarc.
If you force everything to EUC in the first place I guess it does
not make a difference any more. 

Still: imho the proper thing to do would be to honor the language
of the incoming message all the way to the generated HTML.
Imagine you have an international list. People post in Korean,
Chinese, some even in English. Which encoding do I want to 
force on them? None!


On Tue, 14 Dec 1999 11:55:09 JST,  Oskar Bartenstein 
<oskar(_at_)ifcomputer(_dot_)co(_dot_)jp>  wrote;
Mail folders are encoded in JIS. Mhonarc correctly processes this,
and, as far as I can see, outputs correct JIS.

But HTML files should be in Shift-JIS or EUC, not JIS.
This for three reasons:
 1 - the resulting HTML is human-readable 
 2 - the resulting HTML is machine-searchable
     e.g. ht:/dig will do just fine with EUC
 3 - the resulting HTML can be tagged with the specific
     character set, e.g. for EUC:
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=x-euc-jp">
     on a *per-document* basis for reliable delivery.

Kazuro FURUKAWA <kazuro(_dot_)furukawa(_at_)kek(_dot_)jp>
 Linac,  High Energy Accelerator Research Organization (KEK), Japan
Dr. Oskar Bartenstein                 oskar(_at_)ifcomputer(_dot_)co(_dot_)jp
IF Computer Japan               

<Prev in Thread] Current Thread [Next in Thread>