procmail
[Top] [All Lists]

Re: Stripping HTML: Another Question

1997-10-09 15:19:52
_Clint_ <clint(_at_)cray-ymp(_dot_)acm(_dot_)stuorg(_dot_)vt(_dot_)edu> wrote:
Greetings ProcMail Gurus,
I will cut to the chase here:
Isn't there a command-line method for stripping HTML using perl?

This is not a procmail question, but I have gotten so sick of it
that I looked up the answer.

The long way is a script like this:

 #!/usr/local/bin/perl5.004
 
 use HTML::Parse;
 use HTML::FormatText;
 $/=undef;
 $rawhtml = <>;
 $html = parse_html($rawhtml);
 $formatter = new HTML::FormatText;
 print $formatter->format($html);

Shorter would be:

perl5.004 -MHTML::Parse -MHTML::FormatText -0777e '$f=new 
HTML::FormatText;print $f->format(parse_html(<>))'

I don't know of a module to interpert MIME mail so you can decode
MIME parts inline, but one probably exists. Look in the Mail::
or MIME:: modules.

Elijah
------
I /dev/null dupes, no need to CC list posts.  It is not my responsibility to
prove to you my mail is not spam, if mail to you bounces it will not be resent.