Fast stripping of HTML tags from MHonArc-generated files

1998-11-12 17:08:10
I'm trying to improve the speed at which Wilma indexes.  Right now the real
bottleneck is that we pass every MHonArc-generated page through the
striphtml program, which is written in Perl.  The time to load the Perl
interpreter tens or hundred of thousands of times is pretty harsh, and
occasionally we've seen HTML that the simple regexp-based approach freaks
out on, causing it to take near infinite time to process.

Does anyone know of any free (i.e. we can incorporate it into something
under the Artistic License) C code, or a small utility that we can call,
which will do this?

Jason L Tibbitts III - tibbs(_at_)uh(_dot_)edu - 713/743-3486 - 660PGH - 94 
   System Manager:  University of Houston Department of Mathematics 
      "I survived while Ruby died in Jackie's trashy fantasy..."

<Prev in Thread] Current Thread [Next in Thread>