Re: Question

1998-08-04 18:26:36
"RAS" == Richard, Arnold S 
<asdr(_at_)illangai(_dot_)cc(_dot_)utexas(_dot_)edu> writes:

RAS> How do I make WEBGLIMPSE just look for the requested SEARCH word in
RAS> ONLY the header and contents and not elsewhere.

I'm not sure that you can.  What we did for Wilma (which is sort of a
special purpose WebGlimpse geared towards MHonArc and mbox files) was to
filter out all HTML and additionally everything before and after the
X-MsgBody tags that MHonArc conveniently puts in the message as glimpse is
indexing the messages.  (This is done using .glimpse_filters.)  Thus when
you search you get a cleaner hit list.  There is some trickery, though;
glimpse builds a fast index that gives false hits, but goes back to the
original file to make certain that all of its hits are actual.  We don't
apply the filtering when searching, which makes searching fast and gets the
line numbers right.

Unfortunately we use a perl script to do the stripping, which isn't fast
and can take infinite time on bad HTML.  Anyone know a C-based HTML tag
stripper?  (Lynx is unfortunately a bit heavy handed for that purpose.)

 - J<

<Prev in Thread] Current Thread [Next in Thread>
  • Question, Richard, Arnold S.
    • Re: Question, Jason L Tibbitts III <=