procmail
[Top] [All Lists]

RE: Local domain forgery detection?

2002-08-28 14:20:57
    :0
    * 1^1 ^Received:
    { countRCVD = $= }

Thanks!  The existence of that variable wasn't very obvious in the man
pages; it's in procmailsc(5) but only in BUGS in procmailrc(5).  :)

You're welcome.


 * COUNTReceived ?? ^1$

Perhaps you meant the more canonical

   * COUNTReceived ?? ^^1^^

Is one format superior to the other?  I don't see the difference,
unless perhaps the ^^ form gets parsed faster.  Is it significant,
or just an equivalent alternative?

The ^ and $ imply line start or end (they are interchangeable 
in procmail, but we tend to use them linearly).  Actually, 
they each mean the literal newline char.  "^^" means the
leftmost edge or rightmost edge of the field being examined.
If I've misstated something, I look forward to correction.


 :0  # if it's local mail (including via our mailhost), deliver it
     * $ $INFINITY^0 ^Received:.*\<myispname.com \[566\.684\.
     * $         2^0 ^Message-ID:[$WS]*<[^$WS]+(_at_)localhost>$
     *          -1^2 ^Received:
   $DEFAULT

This counts the Received: headers at the same time that it's
conducting the reasonable secure test of a valid Received: line.
If there are too many, it won't consider the mail local.

But if you need to do that count more than once, isn't it 
faster to use
a result stored in a variable?  So for sendmail, maybe something like:

 MYDOMAIN=| hostname | sed "s/`hostname -s`\.//"

I just have a real aversion to piping to two processes on every mail,
for something we can reasonably expect to get within procmail.  Surely
the host name, or [127.0.0.1], or "localhost" is stated in the
top Received: header?  Even if you do want to run hostname, you
could use MATCH to kill the TLD stuff and avoid sed.  

Besides, that syntax for var assignment b0rked procmail 3.22/3.23pre
on an Alpha system I run procmail on.  (Known occasional bug.)

If you really want this variable every time, how about feeding it
to procmail via an INCLUDERC?

As to your question about counting Received: headers, we have
done that already above.  We just assign the value of the score
to a variable.  We might have to play with the choice of the
scoring exponent.  I chose "2" because I figured it makes
visceral sense that the more Received: headers there are,
the further away from "clean" we move exponentially.  But
if you're going to use the count, well, just write it "1^1"
instead.


Okay. :)  Here's my "ATCOUNT" thingee:

Cool!  Thanks!  But...

        :0  # add the subtotals, subtract 4 "gimmes"
            * $ $=^0
            * -4^0
          { TOO_MANY = $ATCOUNT }

Is the TOO_MANY variable actually useful for anything?  Aren't
cases where there are more than two CC recipients *really* common?

Yes, but I use the variable in concert with other tests to
decide if it's spammy.  Am I on the To: line?  Am I on the
Cc: line(s)?  Is the Subject: empty?  Is the Message-ID:
putatively valid?  (Lots of legit mail has Message-ID's that
violate RFCs, including Microsoft Exchange's format, I believe.
So I don't kill based only on that, but combine it with other
of what I call "indicia" (word of art taken from Supreme Court dicta
discussing the 13th Amendment).  Are there any spaces in the
From: line's text?  Etc.  These all, taken in various combinations, 
comprise what I call a calculus of spammy stuff.  I look for
a spammy calculus.  I'll admit, though, that a forged hotmail
or yahoo address is a dead ringer.  :)

-- 
Dallman Ross

"If you find a path with no obstacles, it probably does not lead to
anywhere."
        Thoughts of Rev. Sunnan Kubose, from _Zen in the Markets_ 

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail