Date: Wed, 1 Oct 1997 17:56:06 +0300 (EET DST)
From: Kimmo Jaskari <kimmo(_at_)alcom(_dot_)aland(_dot_)fi>
Subject: HTML mailer bouncing
Alternatively, if someone knows a good way to strip the html tags from a
message and turn it into plain old text I would appreciate tips.
This Perl filter is a start.
#! /usr/bin/perl -w
###
### Simple HTML stripper.
###
require 5.002 ;
## Read text.
undef $/ ;
$_ = <> ;
## Extract body element if present.
s|.*<body.*?>(.*)</body.*?>.*|$1|is ;
## Strip all tags.
s/<!--.*?-->//gs ; # comments
s/<.*?>//gs ; # regular tags
## Convert escapes.
s/</</g ;
s/&rt;/>/g ;
s/&/&/g ;
## Clean up vertical whitespace.
s/^\s*\n// ; # top
s/\n\s*\n/\n\n/g ; # middle
s/\s*\n\s*$/\n/ ; # bottom
## Write stripped text.
print ;
If you want to get fancy, you could try to interpret some of the tags
and maybe wrap text.
--
Why do they call it rush hour when nothing moves? -- Mork