procmail
[Top] [All Lists]

Re: scanning the body of a message

1997-05-13 01:44:00
"Guy" == Guy Geens 
<Guy(_dot_)Geens(_at_)elis(_dot_)rug(_dot_)ac(_dot_)be> writes:
"Wotan" == Wotan  <wotan(_at_)netcom(_dot_)com> writes:
Wotan> How will this work?

Wotan> :0 B
Wotan> * ? sed -e 's/^/http:\/\/www\./' $PMDIR/sexspam
Wotan> /dev/null
Guy> I think the answer lies in a perl script, but I'm too tired to
Guy> write it right now.

I found a solution. I saw something like this in the `Programming
Perl' book, but I can't find the exact page anymore.

So here is the script urlgrep.pl:
#! /usr/bin/perl -w

$urlfile = shift @ARGV;
die "Usage: $0 filename" unless defined $urlfile;
open (URLFILE, $urlfile) || die "Can't open $urlfile: $!";

# Generate perl code in $evalstring
# Note: the study might speed things up, or slow them down (as indicated by
# the man pages)
$evalstring = <<"EOF";
while (<STDIN>) {
    study;
EOF
while (<URLFILE>) {
    chomp; # Remove the newline

    # Convert each line of URLFILE to a regex match
    # Return 0 if we get a match
    # The lines of URLFILE are taken as literal strings
    $evalstring .= "m,http://www\\.\Q$_\E, && return 0;\n";

    # Use the following for full regex matching:
    #s/,/\\,/;  # comma's in an URL?
    #$evalstring .= "m,http://www\\.$_, && return 0;\n";
}
# Finish the while loop. If we get here, nothing has matched
$evalstring .= <<"EOF";
}
return 1;
EOF

# For testing purposes:
# print $evalstring;

# Evaluate the loop
$result = eval $evalstring;
# If we had a syntax error, die here
die $@ if $@;
# Return the result
# 0 == success, 1 == failure (normal procmail rules)
exit $result;

And this is the accompanying procmail recipe:
:0B
* ? urlgrep.pl $PMDIR/sexspam
/dev/null

(Usual precautions about dumping to /dev/null apply.)

-- 
Guy Geens <ggeens(_at_)iname(_dot_)com>
Home Page: http://www.elis.rug.ac.be/~ggeens
finger ggeens(_at_)elis(_dot_)rug(_dot_)ac(_dot_)be for PGP public keys (or use 
keyserver)