procmail
[Top] [All Lists]

Re: Stripping HTML: Another Question

1997-10-09 22:00:55
Quoting _Clint_ (clint(_at_)cray-ymp(_dot_)acm(_dot_)stuorg(_dot_)vt(_dot_)edu):
:0:
* ^Subject: Your registered page.*has changed
| perl -pe 's/<[^>]*>//g' | perl -pe 's/====URL-MINDER====//g'
$HOME/incoming         # (ignore the line break)
  
I tried doing this, and it didn't filter it correctly,and my logfile says:

procmail: Locking "/home/clint/incoming/tv.lock"
procmail: Executing " perl -pe 's/<[^>]*>//g' | perl -pe
's/--====URL-MINDER====/OH!/g' >> $HOME/incoming/tv"
/g: Event not found.

Hmm. My question would be: why does the version in the logs say
"URL-MINDER====/OH!/g"? Did you edit the script you sent? I'd say that
the shell is trying to interpret the !/g as a request to repeat the last
command that began with !/g (and can't find it.) This script is also
going to strip out all the email addresses in the header (which are also
surrounded by <>.) Try this:

:0
* ^Subject: your registered page.*has changed
{
        :0fb
        | sed 's/<[^>]*>//g' | sed 's/--====URL-MINDER====/OH\!/g'

        :0:
        incoming/tv
}

As I said in the first message, perl is overkill for simple regex search
and replace; sed will do the same thing with fewer resources. This
recipie starts by checking the headers for the proper subject. Then it
runs the body of the message through the sed processes, using the f flag
to indicate filtering (which means that the message has not yet been
delivered.) The next rule locks and delivers to incoming/tv.

-- 
Michael Stone, Sysadmin, ITRI     PGP: key 1024/76556F95 from mit keyserver,
mstone(_at_)itri(_dot_)loyola(_dot_)edu            finger, or email with 
"Subject: get pgp key"