Malformed UTF-8 character


Hello ,

I have a problem when files are indexed by /usr/bin/mknmz.
It 's a known problem in this list mharc-users(_at_)mhonarc(_dot_)org:

Malformed UTF-8 character (unexpected continuation byte 0xb8, with no 
prece.............


I saw in this list that I had to change the $LANG to C  before running mharc 
scripts ...
I did this but nothing was changed , i had always the same error.

$ export LANG=C
$ locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

It's very annoying because the indexation in all  the list doesn't be  ok 
....when I search something which is in a message , I always have "No document 
matching your query" ....

In my list directory I have all the  files named  NMZ....i  and the NMZ.w 
contains the word I'm looking for ..... but namazu doesn't find it .....

If i delete in the mbox the message which have "Malformed UTF-8 character 
(unexpected continuation byte 0xb8, with no prece............." ( and run a 
web-archive -rebuild liste ) all is ok  ....

what must I do ? I can't delete all the bad files ....

thank's for your response ......






   Marie-Noelle DAUPHIN                                               
   IDRIS/CNRS                                                        
   Batiment 506 - B.P. 167 - 91403 Orsay Cedex - France                  
   Telephone : 33. 1. 69.35.85.41                                       
   Telecopie : 33.1 69.35.37.75  Messagerie : dauphin(_at_)idris(_dot_)fr       
 
   MIME messages welcome                                                       



---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS