Re: question regarding header handoff to external script

On Fri, 24 Dec 1999 12:51:57 -0800 (PST), Jauder Ho
<jauderho(_at_)carumba(_dot_)com> wrote:

:0 c
| mailExtractIP 

# (how do we capture the result back from the output of mailExtractIP?)

:0
* IPDEFINED 
  :0
     # check for existance in RBL by either using rblcheck or the
     # spambouncer stuff
     {
             :0 f
             | ${FORMAIL} -A"X-SBRule: IP ${CHECKIP} is in DUL"
     }
| dmail +incoming/spam


rblcheck comes with a simple example of how to do this (although it's
not very efficient in rblcheck 1.4). There was a longish thread here
recently which was started by Walter Dnes. You should probably look at
how he ended up doing it. The code is now part of the Spam Dunk
package, I believe. <http://www.interlog.com/~waltdnes/spamdunk/>

You don't necessarily want the extra set of braces there. Remember the
conditions are evaluated top to bottom and ANDed. So you can say
simply

    :0fwh  # notice use of w and h flags!
    * CHECKIP ?? .
    * check for existance in RBL by either using rblcheck or ...
    | formail -A "X-SBRule: IP $CHECKIP is in DUL"

    :0a
    | dmail +incoming/spam

... depending of course a bit on the exact nature of the second
condition.

#!/usr/bin/perl
#
# mailExtractIP
#
# Originating host IP extraction from mail headers coming from STDIN
#

my @orgheaders;
my @fixedheaders;


while (<STDIN>) {
     chomp();
     push(@orgheaders,$_);
}

for (@orgheaders) {
     if (/^\s/) {
             s/^\s+/ /;
             $fixedheaders[$#fixedheaders] .= $_;
     } else {
             push(@fixedheaders,$_);
     }
}

for (@fixedheaders) {
     if (/^Received:/) {
             ($matchIP) = /^Received: from.*\[(.*)\].*/;
             print $matchIP;
     }
}


You can simplify the Perl here a bit, but the most important fix would
be to tighten up your regexps. In Perl, .* always matches for as long
as possible. So if you have "Received: from foo (bar [baz]) [possibly
forged]" your regex will grab the "possibly forged" string rather than
"baz" (which would be the desired IP number).

Anyway, it's probably better to do this entirely within Procmail.

On Sat, 25 Dec 1999 14:28:07 -0800 (PST), Jauder Ho
<jauderho(_at_)carumba(_dot_)com> wrote:

On Sat, 25 Dec 1999, Stan Ryckman wrote:

If you fix that up (and I don't know how you're intending to
pick the "correct" Received: to look at) then...

Thanks for that. That was a little bug that I've fixed. My
intention is to look at every extracted IP (the really interesting
one is usually the last) as an open relay as an intermediate step
should be suspect.


Actually the +first+ one +after+ your local IPs should be the one you
check. Walter's solution handles that just fine.

FWIW, it's relatively rare to see multi-hop relay rapes. They do
happen, but more often than not, anything after the first non-local
Received: header is likely to be forged anyway, so checking all IP
numbers is usually a royal waste of time.

1) How do I get the exit code back from an external program?


It's in $? or you can use it directly in a recipe with the * ? prog
syntax. (Example below.)

2) The way you have the recipe worded, how does mailExtractIP get passed
   via STDIN the headers to be parsed?


In a recipe like this,

    VALUE=`program`

the program will get the message passed to it on stdin. Fortunately,
it will not "consume" it, the way this construct works in the shell;
every program which reads its stdin gets a "fresh" copy of the message
which is being processed (except of course it might have been modified
by filtering actions in previous recipes).

3) is it possible/legal to do 

   | formail -A"X-SPAM: Originating or relay host is in RBL" | dmail
+incoming/spam

that is to tack on a message header before sending it off to the spam
folder?


Perfectly legal. Possibly you want to split it because of the way the
shell handles exit codes, though; in the unlikely event of a problem
with the formail call, the exit code from formail will be lost. You
will also be invoking a shell to evaluate the multi-pipe pipeline.

Here's what I use. It's probably not as elaborate as Walter's stuff
(and certainly not as portable -- my regular expressions are fairly
Sendmail-centric) but it gets the job done for me.

First, the pseudocode version, rated PG:

    MX=foo|bar|baz  # Actually I should perhaps list these by IP number!
    :0
    * $ ^(Received.*$MX)*|Received: from [a-z0-9.]* \[\/[0-9][0-9.]+[0-9]
    * ! ? rblcheck $MATCH
    { REJECT="reason for rejection" }

    # ...

    :0
    * REJECT ?? .
    { copy REJECT to LOG, and formail -I a rejection notice to the headers }

And now, the full version, rated R:

######## huge mounds of SHELL=/bin/sh and SPAMFOLDER=something and 
# all that jive

# Set up some variables which are used globally.
# NL and REJ are just convenient shorthands; REJECT accumulates all the
#  rejection notices so we can produce a summary at the end.
# The accumulated statistics are not really useful for end users; I want
#  the statistics so I can see which recipes are actually the ones which
#  catch the most spam, so I can do various optimizations of these recipes
#  based on that information. You probably want to reject on the first match
#  and skip any further processing.

REJECT=
NL="
"
REJ="X-Rejected: "

# Also LOGGED_FROM gets set to either a space or the empty string
#  depending on whether I want the regular log messages left aligned
#  or not.

######## modest amounts of other stuff elided

# rblcheck

# primary MX
MX='helsinki\.fi|iki\.fi'
# secondary
MX="$MX"'|pobox3\.funet\.fi|(hauki|lohi)\.clinet\.fi'

# The remainder are hosts of mailing lists I subscribe to, not MX handlers
#  proper. I don't need (or want) to check them against the RBL, I want the
#  original injection point. So these should be skipped just like real MX:es.

# spam-list
MX="$MX"'|han\.de|hiss\.org|spam-archive\.org'
# cuci
MX="$MX"'|(cuci|giganet)\.nl|(smtp\.nl|adam\.ixe)\.net|regiovista\.com'
# procmail lists
MX="$MX"'|rwth-aachen\.de'

# First grab operator is to force maximal matching on the first *
# Second condition excludes from checking anything where the entire Received:
#  chain consists of hosts in our list of trusted MX:es (i.e. last line of
#  MATCH is from one of the trusted ones)
# The "skip" of X-From_|From|Message-Id should really skip any non-Received
#  headers but I've found this to be good enough in practice. Also the
#  lines with only Received: (comments), which are produced by qmail and
#  SmartList and possibly some other mailing lists, are skipped as
#  uninteresting.
:0
* $ ^\/(Received: from ([a-z0-9_-]+\.)*($MX)\>.*($)\
        ((Received: \((from [^(_at_)]+@|[a-z]+ [0-9]+ invoked from network)|\
         (X-From_|From|Message-Id):).*($))*\
        )*\
      Received: from [^[]*\[[1-9][0-9]*\.[0-9]+\.[0-9]+\.[1-9][0-9]*
* ! MATCH ?? $ ^Received: from ([a-z0-9_-]+\.)*($MX)\>.*^^
{
    # Now trim down to the part we actually wanted
    :0
    * MATCH ?? ()\[\/[0-9.]+^^
    { }

    RBLIP=$MATCH
    LOG="rblcheck: checking IP $MATCH$NL${LOGGED_FROM}"
    RBLREJ="$REJ$RBLIP blocked in "

    :0
    * ! ? $HOME/bin/osf1/rblcheck -c -q -s relays.mail-abuse.org $RBLIP
    { RBLREJECT="$RBLREJECT${RBLREJECT:+$NL}${RBLREJ}RSS" }

    :0
    * ! ? $HOME/bin/osf1/rblcheck -c -q -s relays.orbs.org $RBLIP
    { RBLREJECT="$RBLREJECT${RBLREJECT:+$NL}${RBLREJ}ORBS" }

# Etc for a largish number of other lists; see <http://www.iki.fi/era/rbl/>

}
:0E
{
    # We want a diagnostic when the MATCH was completely unsuccessful,
    #  but not if checking was skipped because the MATCHed Received: lines
    #  were from trusted MX hosts
    :0
    * ! MATCH ?? ^^Received:
    { LOG="spam.rc: no IP address for rblcheck$NL$LOGGED_FROM" }
}

######## huge mounds of other spam checking snipped

:0
* REJECT ?? .
{
    :0  # If RBLREJECT is set, add that to main REJECT
    * RBLREJECT ?? .
    { REJECT="$REJECT${REJECT:+$NL}$RBLREJECT" }

    :0  # Spaghetti code indeed. Modify following sed if LOGGED_FROM is set
    * LOGGED_FROM ?? ^^ ^^
    { SEDADDR="1s/^X-Rejected/spamreject/ -e 1!" }

    LOG=`echo "$REJECT" | \
        sed -e $SEDADDR's/^X-Rejected/ spamreject/'`"$NL$LOGGED_FROM"

    :0fwh
    | formail -I"$REJECT"

#    EXITCODE=67

    :0:
    $SPAMFOLDER
}

Hope this helps,

/* era */

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition