Re: procmail and checking sendmail's second received header..

On Thu, 16 Nov 2000 02:08:47 -0500, Maria callas
<callas(_at_)encrypted(_dot_)net> wrote:

     I want to check sendmail's SECOND received header (by) for
the host or IP there, check that against a file (defined in a variable) 
for a list of known spam or open relays. If it matches, then ...
it doesn't matter.. send it to /dev/null.


What I have is not exactly what you're asking for, but it could
perhaps help set your brain moving in the right direction.

The rules below skip over Received: lines for "local" hosts and look
at the first Received: header after that. If you know exactly what the
first Received: header should contain, you could actually end up with
something +much+ simpler than this.

(Hint: ^Received: static or predictable blah blah$Received: \/grab this)

As some of the variable names imply, I use this for RBL checking. I've
omitted that part, though.


### Snipped a lot of uninteresting stuff below; hopefully I left in
##### all the pertinent variable definitions etc

### LOGGED_FROM is set in another file which calls this one; it is
##### a space or an empty string depending on whether I've faked a
##### From_ line in the log earlier and want subsequent From_s to be
##### indented by a space. Oops, you probably didn't even want to know.

SHELL=/bin/sh

# Set up some variables which are used globally.
# NL and REJ are just convenient shorthands; REJECT accumulates all the
#  rejection notices so we can produce a summary at the end.
# The accumulated statistics are not really useful for end users; I want
#  the statistics so I can see which recipes are actually the ones which
#  catch the most spam, so I can do various optimizations of these recipes
#  based on that information. You probably want to reject on the first match
#  and skip any further processing.
# REJFOLDER is set to either $SUSPFOLDER or $SPAMFOLDER depending on gravity
#  of diagnostics (as in, it's suspect, or it's sure to be spam).

REJECT=
NL="
"
REJ="X-Rejected: "

REJFOLDER=

# primary MX
MX='helsinki\.fi|iki\.fi'
# secondary
MX="$MX"'|pobox3\.funet\.fi|(hauki|lohi|mail)\.clinet\.fi'

# The remainder are hosts of mailing lists I subscribe to, not MX handlers
#  proper. I don't need (or want) to check them against the RBL, I want the
#  original injection point. So these should be skipped just like real MX:es.

# spam-list
MX="$MX"'|han\.de|hiss\.org|spam-archive\.org'
# cuci
MX="$MX"'|(cuci|giganet)\.nl|(smtp\.nl|adam\.ixe)\.net|regiovista\.com'
# procmail lists
MX="$MX"'|rwth-aachen\.de'
# EuroCAUCE
MX="$MX"'|zorch\.sf-bay\.org|(sfo|paix)\.cp\.net'
# Jargon-SE
MX="$MX"'|stacken\.kth\.se'

# First grab operator is to force maximal matching on the first *
# Second condition excludes from checking anything where the entire Received:
#  chain consists of hosts in our list of trusted MX:es (i.e. last line of
#  MATCH is from one of the trusted ones)
# The "skip" of X-From_|From|Message-Id should really skip any non-Received
#  headers but I've found this to be good enough in practice. Also the
#  lines with only Received: (comments), which are produced by qmail and
#  SmartList and possibly some other mailing lists, are skipped as
#  uninteresting.
:0
* $ ^\/(Received: from ([a-z0-9_-]+\.)*($MX)\>.*($)\
        ((Received: \((from [^(_at_)]+@|[a-z]+ [0-9]+ invoked from network)|\
         (X-From_|From|Message-Id):).*($))*\
        )*\
      Received: from [^[]*\[[1-9][0-9]*\.[0-9]+\.[0-9]+\.[1-9][0-9]*
* ! MATCH ?? $ ^Received: from ([a-z0-9_-]+\.)*($MX)\>.*^^
{
    # Now trim down to the part we actually wanted
    :0
    * MATCH ?? ()\[\/[0-9.]+^^
    { }

    RBLIP=$MATCH
    LOG="rblcheck: checking IP $MATCH$NL${LOGGED_FROM}"

#### ... trimmed here ...
}
#### .... but left these one in as an additional bonus :-)

# External, yet a local or secondary MX's Message-Id
:0
* RBLIP ?? [0-9]
* $ ^Message-Id:[       ]*\/<[^<>@]+@([a-z0-9_-]+\.)*($MX)>
{ REJECT="$REJECT${REJECT:+$NL}${REJ}External, yet local Message-Id: $MATCH"
  REJFOLDER=spam }


# My own little blacklist
:0
* RBLIP ?? [0-9]
* ? echo "$RBLIP" | grep -f $HOME/procmail/ip-block.txt
{ REJECT="$REJECT${REJECT:+$NL}${REJ}IP blocked in ip-block.txt: $RBLIP"
  REJFOLDER=spam }


The part which actually acts on the REJECT part and saves to the spam
folder if that is set is omitted as well. You don't need that stuff; I
like to get diagnostics on +all+ my spam recipes to they all have to
be run before the spam is thrown away (or actually saved to my on-line
spam archive, http://www.iki.fi/era/spam/ :-)

Hope this helps,

/* era */

Yes, I'm still alive, but I'm too overworked to be a regular on the
Procmail list right now, sorry folks :-(

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail