"CL" == Christopher Lindsey <lindsey(_at_)ncsa(_dot_)uiuc(_dot_)edu>
CL> What does Wilma do with the stripped pages? Are they stored on disk or
CL> deleted right after indexing?
They exist only in the pipeline internal to Glimpse. They're never needed
again and an HTML page only needs to be stripped as it is indexed.
CL> The search interface at mallorn.com strips the pages and stores them in
CL> an alternate directory used for indexing.
That would be a waste of space for Wilma. Because of the peculiar way
Glimpse works, when you actually do the searching it is just fine to let it
have the HTML file.
CL> for me the biggest problem is indexing with glimpse. That takes
CL> several hours to complete (still looking at realtime mySQL updates
Wilma uses incremental mode, which works rather quickly. This restricts
you to the Glimpse versions where this works, though.