Earl Hood wrote:
I have, in a myriad of ways. I just recreated things on one of my
local systems to make analysis easier.
I've made available of the command used and the output of a
stock namazu 2.0.14 installation available for your examination at
<http://www.mhonarc.org/tmp/mknmz-out.txt.gz>. I.e. No modifications
to namazu code is done, so the many "malformed utf-8 ..." messages
are provided. Perl also complains about wide characters in print.
I've also made available the input files and NMZ.* files at
the following locations:
<http://www.mhonarc.org/tmp/namazu-users-en_NMZ_files.tar.gz>
<http://www.mhonarc.org/tmp/namazu-users-en_input_files.tar.gz>.
To our regret, Namazu supports ASCII text-only input.
(However, Japanese text can be used for a Japanese environment. )
For instance,
namazu-users-en/2000-07/msg00000.html is a Japanese text.
namazu-users-en/2003-06/msg00000.html is non-ASCII text.
...
In addition a lot.
Please use it by ASCII text-only.
Also, the "Malformed UTF-8 ..." warnings are popping up, regardles
of what LANG or LC_ALL are set to. I had to add a 'use bytes' pragma
to mailnews.pl at line 212 to get rid of the warnings.
'use bytes' is not the one that only warning is erased, and the root
of the problem is solved.
By the way,
The EUC-JP text is included in line 216 of mailnews.pl.
Japanese processing is done excluding a Japanese environment.
Therefore, warning has gone out.
I want to correct Japanese processing as doing only in a
Japanese environment.
(However, it is not because non-ASCII text comes to be treatable. )
--
=====================================================================
TADAMASA TERANISHI
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint = 474E 4D93 8E97 11F6 662D 8A42 17F5 52F4 10E7 D14E
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en