Re: Fast stripping of HTML tags from MHonArc-generated files

"CL" == Christopher Lindsey <lindsey(_at_)ncsa(_dot_)uiuc(_dot_)edu> 
writes:


CL> What does Wilma do with the stripped pages?  Are they stored on disk or
CL> deleted right after indexing?

They exist only in the pipeline internal to Glimpse.  They're never needed
again and an HTML page only needs to be stripped as it is indexed.

CL> The search interface at mallorn.com strips the pages and stores them in
CL> an alternate directory used for indexing.

That would be a waste of space for Wilma.  Because of the peculiar way
Glimpse works, when you actually do the searching it is just fine to let it
have the HTML file.

CL> for me the biggest problem is indexing with glimpse.  That takes
CL> several hours to complete (still looking at realtime mySQL updates
CL> instead).

Wilma uses incremental mode, which works rather quickly.  This restricts
you to the Glimpse versions where this works, though.

 - J<

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Fast stripping of HTML tags from MHonArc-generated files, Jason L Tibbitts III

Next by Date:

Re: How to add new indexes to an existing archive?, Earl Hood

Previous by Thread:

Re: Fast stripping of HTML tags from MHonArc-generated files, Christopher Lindsey

Next by Thread:

Re: Fast stripping of HTML tags from MHonArc-generated files, Earl Hood

Indexes:

[Date] [Thread] [Top] [All Lists]