Re: Filtering Spam (was Re: A Call To Responsibility...)

        One idea (if you have CPU to burn)...

        Pipe all messages to a small C program that parses the from
        address and rejects any that come from '@site' where site does
        not hava valid A or MX record in the dns... (Maybe check
        Reply-To: as well...)

        (As you would not be able to reply to them anyway :)

                David/abs

david(_at_)city(_dot_)ac(_dot_)uk +44 171 477 8186 (MIME) 
david(_at_)southern(_dot_)com +44 0181 88 8949
Network Analyst, UCS, City University   System Manager, Southern Studios Ltd
Northampton Square, London EC1V 0HB                PO Box 59, London N22 1AR

        <<< Monochrome - Largest UK Internet BBS - telnet mono.org >>>

=- Microsoft: Abort and Retry Cancel -or- NetBSD: http://www.netbsd.org -=<

         (Apologies for long signature - in process of changing jobs)

On Mon, 30 Oct 1995, Hal Wine wrote:

At 10:42 10/30/95, J. Daniel Smith wrote:

Ok fine.  So lets begin to discuss some techniques for finding (and
thus deleting) SPAMs using procmail.  Something that would have caught
this particular spam is a check for more than a single address in the
"From: " header...  Here's an imperfect start
  :0
  * ^From:( |  )?(_dot_)*(_at_)(_dot_)*\(_dot_)(_dot_)*,
  /dev/null
that is pitch any From: header with a comma in it.  I've tried to make
the regexp is little bit more commplicated so as to only match a comma
after an email address and not a "From: " header like
  From: "J. Daniel Smith, Bristol Technology" <dan(_at_)bristol(_dot_)com>


However, the above address could just as well have been given as:
    dan(_at_)bristol(_dot_)com (J. Daniel Smith, Bristol Technology)

In general, you're going to have trouble with this approach.  You can't
even check for more than one '@', as some mailers "redirect" messages using
something like:
    From: dan(_at_)bristol (Dan Smith) (by way of hal(_at_)dtor(_dot_)com 
(Hal Wine))

To really pull apart a multiple address "From:" you need more powerful
processing than I can figure out how to do directly in procmail.  Perhaps a
perl script called on suspect headers, e.g. one that has any comma in the
From::
  :0 h
  * ^From.*,
  * ? isSpam
  /dev/null

Here's a quick perl script that should do the trick a bit more of the time:
#!/usr/local/bin/perl

# headers are piped into this program, read them all in at once
undef $/;
$headers = <>;

# unfold the continued lines
$| = 1; # search across newlines
$headers =~ s/\n\s+/ /g;

# split into separate header lines, and toss into an array
foreach $line (split( /\n/, $headers )) {
    ($key, $value) = $line =~ /^(\S+)\s+(.*)$/;
    $key = "\L$key";
    # keep all data from multiply defined fields
    $sep = defined( $field{ $key } ) ? ',' : '';
    $field{ $key } .= "$sep$value";
}

$multiFrom = &test( "From:" )
          || &test( "Reply-To" );

# tell procmail we handled it if it's a multiFrom
exit $multiFrom ? 0 : 1;

sub test {
    local( $key ) = @_;
    local( $_ ) = $field{ "\L$key" };
    # strip comments
    1 while s/\s*\([^\(\)]*\)//;
    1 while s/\s*\"[^\"]*\"//;
    # any embedded white space or comma is multi address
    /\S[,\s]+\S/;
}

--Hal
Hal Wine <hal(_at_)dtor(_dot_)com>     voice: 510/482-0597