Re: Use scoring to determine header format?

At 22:31 2004-05-17 -0400, fleet(_at_)teachout(_dot_)org wrote:

On Mon, 17 May 2004, Professional Software Engineering wrote:

> At 16:44 2004-05-17 -0400, fleet(_at_)teachout(_dot_)org wrote:
> >I'm seeing spam messages that appears to be from one individual (or
> >perhaps one software) that have a specific header format as:

[header format snipped]

> matching the above as-is certainly doesn't mandate using scoring toachieve it.


I'm not sure what you're saying here.  I tried, without success:

* Received:
* Received:
* Received:
* Message-id:
* Received:



One condition line, no scoring:

:0
* ^Received:.*^Received:.*^Received:.*^Message-id:.*^Received:

> * ^(From|Date|Subject|Reply-To):(.*$)+Received:

This works; but doesn't restrict the matches with respect to number (of
course).  But now I'm confused about the '+'.  Here it seems to be
concatenation and not "one or more."


The point was, one condition line, and it crosses header lines.

> There's no RFC which declares that Received headers must appear beforeothers.
And that answers my other question!  Thank you.

Well, that doesn't mean that using it as a spammy trait isn't useful - justas certain keywords are more spammy than others. However, traits whichfind an equal distribution amongst spam and legitimate traffic are a bitmore difficult to justify.

:0
* -4^0
* 1^0 ^Received:(.*$)+Received:
* 1^0 ^Received:(.*$)+Received:
* 1^0 ^Received:(.*$)+Message-Id:
* 1^0 ^Message-Id:(.*$)+Received:
* 1^0 ^Received:

The first two regexp conditions will match the SAME two receivedheaders. If you really want three in a row, why not just add a thirdReceived in ONE condition? If you were to duplicate the condition line athird time, you'd still be matching on TWO received lines (and there's norequirement here that they be BEFORE the Message-ID, or consecutive).

The final condition will match on any old received header, and is justabout GUARANTEED to match on every email that passes through your system(at least via SMTP - a local delivery directly from some app into your LDAwon't insert, but that supposes that something is bypassing the MTA to do so).

FURTHER - the (.*$)+ expression will match *MULTIPLE* intermediate headerlines. Thus, the following will match your complete recipe:


Received: blah
Message-Id: blah
From: yea_not_part_of_the_condition
Recieved: blah

Those two received lines meet the first and second conditions, the firstreceived and the message-id meet the third condition, the message-id andthe skip-then-recieved line meets the fourth condition, and the FIRSTreceived line is going to match your final condition.

If you want three receiveds, a message-id, and a fourth received, scoringisn't part of the picture - a single-line unified regexp is:


:0:
* ^Received:(.*$)Received:(.*$)Received:(.*$)+Message-Id:(.*$)+Received:
spew.mbx


In English:

Three received lines in IMMEDIATE SUCCESSION (no intermediateheaders), then optionally other headers (the + following the third receivedexpression), then the Message-Id:, followed by optional intermediateheaders (again, the +), followed by another Received:

Lose the + expressions if you actually want the series to be consecutiveheaders without intermediate fluff.

There's no scoring, as it really isn't applicable here, and it's onlyconfusing the matter for you.

Increasingly, I find that unwanted email doesn't really carry a lot ofextra Received headers - sure, some spammers still think it'll throw thescent, but so many seem to be spamming directly from broadband accountsnowadays instead of spoofing through other servers (many of which getblocked by DNSBLs).

The problem is - How do I say in the last condition "Received: followed by
NOT Received.  I tried * 1^0 ^Received:(.*$)+[^Received:], which didn't

Uhm, why do you need to do this? Shouldn't the Message-id:(.*$)^Received:match your last two header conditions just fine? if you really wantsomething following the final received header, you can add (.*$)(.*$) tothe regexp I gave above, or you can but a characer class inversion:


[^R][^e][^c][^i][^e][^v][^e]^[d][^:]

Which I think is wholly unnecessary unless you really believe there's anissue where there will be ONLY one Received; after the Mesage-Id, butMULTIPLE such headers are kosher.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail