procmail
[Top] [All Lists]

Filtering Spam (was Re: A Call To Responsibility...)

1995-10-30 17:45:27
At 10:42 10/30/95, J. Daniel Smith wrote:
Ok fine.  So lets begin to discuss some techniques for finding (and
thus deleting) SPAMs using procmail.  Something that would have caught
this particular spam is a check for more than a single address in the
"From: " header...  Here's an imperfect start
  :0
  * ^From:( |  )?(_dot_)*(_at_)(_dot_)*\(_dot_)(_dot_)*,
  /dev/null
that is pitch any From: header with a comma in it.  I've tried to make
the regexp is little bit more commplicated so as to only match a comma
after an email address and not a "From: " header like
  From: "J. Daniel Smith, Bristol Technology" <dan(_at_)bristol(_dot_)com>

However, the above address could just as well have been given as:
    dan(_at_)bristol(_dot_)com (J. Daniel Smith, Bristol Technology)

In general, you're going to have trouble with this approach.  You can't
even check for more than one '@', as some mailers "redirect" messages using
something like:
    From: dan(_at_)bristol (Dan Smith) (by way of hal(_at_)dtor(_dot_)com (Hal 
Wine))

To really pull apart a multiple address "From:" you need more powerful
processing than I can figure out how to do directly in procmail.  Perhaps a
perl script called on suspect headers, e.g. one that has any comma in the
From::
  :0 h
  * ^From.*,
  * ? isSpam
  /dev/null

Here's a quick perl script that should do the trick a bit more of the time:
#!/usr/local/bin/perl

# headers are piped into this program, read them all in at once
undef $/;
$headers = <>;

# unfold the continued lines
$| = 1; # search across newlines
$headers =~ s/\n\s+/ /g;

# split into separate header lines, and toss into an array
foreach $line (split( /\n/, $headers )) {
    ($key, $value) = $line =~ /^(\S+)\s+(.*)$/;
    $key = "\L$key";
    # keep all data from multiply defined fields
    $sep = defined( $field{ $key } ) ? ',' : '';
    $field{ $key } .= "$sep$value";
}

$multiFrom = &test( "From:" )
          || &test( "Reply-To" );

# tell procmail we handled it if it's a multiFrom
exit $multiFrom ? 0 : 1;

sub test {
    local( $key ) = @_;
    local( $_ ) = $field{ "\L$key" };
    # strip comments
    1 while s/\s*\([^\(\)]*\)//;
    1 while s/\s*\"[^\"]*\"//;
    # any embedded white space or comma is multi address
    /\S[,\s]+\S/;
}

--Hal
Hal Wine <hal(_at_)dtor(_dot_)com>     voice: 510/482-0597


<Prev in Thread] Current Thread [Next in Thread>