On May 26, 2004 at 16:03, David L. Dewey wrote:
Check your *LANG* environment settings. Before running mharc scripts,
or mknmz, set them to the C locale. Namazu does not support UTF-8
locale settings.
Thanks, Earl, but that didn't seem to work... LANG is now
set to C. I began the reindex and it ran for a long time
w/o error, but then suddenly blew up with tens of thousands
of these again:
There may be multiple language-related environment settings. Do a
printenv and examine which envrionment variables need to be fixed.
Malformed UTF-8 character (unexpected continuation byte
0xb8, with no preceding start byte) in pattern match (m//)
at /usr/local/share/namazu/filter/mailnews.pl line 216,
<GEN3> line 45191.
This problem occurs because perl is treating the source character
encoding as UTF-8, but the source contains 8-bit octets that should
not be treated as UTF-8.
I did a hack of adding a 'use bytes' pragma within the the block
that is causing problems to force perl to treat data in the offending
regex as bytes instead of characters:
--- mailnews.pl.20040505 2004-05-05 14:52:23.000000000 -0700
+++ mailnews.pl 2004-05-05 15:03:43.000000000 -0700
@@ -209,6 +209,7 @@ sub mailnews_citation_filter ($$) {
$$contref = "";
my $i = 0;
for my $line (@tmp) {
+ use bytes;
# Complete excluding is impossible. I tnink it's good enough.
# Process only first five paragrahs.
# And don't handle the paragrah which has five or longer lines.
--ewh
---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS