Hello ,
I have a problem when files are indexed by /usr/bin/mknmz.
It 's a known problem in this list mharc-users(_at_)mhonarc(_dot_)org:
Malformed UTF-8 character (unexpected continuation byte 0xb8, with no
prece.............
I saw in this list that I had to change the $LANG to C before running mharc
scripts ...
I did this but nothing was changed , i had always the same error.
$ export LANG=C
$ locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=
It's very annoying because the indexation in all the list doesn't be ok
....when I search something which is in a message , I always have "No document
matching your query" ....
In my list directory I have all the files named NMZ....i and the NMZ.w
contains the word I'm looking for ..... but namazu doesn't find it .....
If i delete in the mbox the message which have "Malformed UTF-8 character
(unexpected continuation byte 0xb8, with no prece............." ( and run a
web-archive -rebuild liste ) all is ok ....
what must I do ? I can't delete all the bad files ....
thank's for your response ......
Marie-Noelle DAUPHIN
IDRIS/CNRS
Batiment 506 - B.P. 167 - 91403 Orsay Cedex - France
Telephone : 33. 1. 69.35.85.41
Telecopie : 33.1 69.35.37.75 Messagerie : dauphin(_at_)idris(_dot_)fr
MIME messages welcome
---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS