mharc-users

Re: subject line truncated in archives search

2004-12-18 15:47:33
Quoting Earl Hood (earl(_at_)earlhood(_dot_)com):

On December 16, 2004 at 15:51, Dave Dewey wrote:

Background - I recently rebuilt my archives index due to some issues with
Namazu related to encoding on a new mailserver.  Everything seems to be
fine, Namazu now runs from cron with no errors.

Which version of namazu are you using?

2.0.13.

However, when you do a search in the list archives, the returned search
results page contains mostly random single-character subject lines.
Sometimes a single letter or a number, often a single question mark, and
sometimes the full subject.  The two-line content snip underneath is also a
single character, although the Author and Date lines are fine. Using the
Thread or Date indexes are fine, no problems, all subjects are intact, it's
only when searching that the problem appears.

Take a look at the file NMZ.field.subject in the html directory
for the archive you are searching.  The file should contain the
subject's of the message pages, one subject per line.  Check to
see if any of the lines are blank or contain junk.

Oy, that took a while to scan visually -- 210,588 lines.  Big archive.  All
of the lines look intact, including the ones that appear as single
characters in the returned search.

You can also test subject-specific searching in namazu by doing:

  +subject:<text-here>

This way you can verify if it is only a subject display issue
in the search results or if subject text is messed up in the
search index.  Doing a search like above should cause namazu
to use the NMZ.field.subject file.

Performing a search in this manner produces the same results; ie,
many (but not all) single-character subject lines.

So the subjects are intact in NMZ.field.subject, but aren't always making
into the results page.

Another thing to do is to create a sample archive containing the
message files giving truncated subjects in search results.  You can
do this by just copying a msg#####.html file into a test directory.
Then run mknmz on the directory and see if the file is indexed
properly.  You can examine the NMZ.field.subject file created during
indexing to see if the subject was properly extracted from the file.

I'll try this next.  Thanks for your help!

dave

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS

<Prev in Thread] Current Thread [Next in Thread>