procmail
[Top] [All Lists]

Re: Use scoring to determine header format?

2004-05-17 15:15:43
At 16:44 2004-05-17 -0400, fleet(_at_)teachout(_dot_)org wrote:
I'm seeing spam messages that appears to be from one individual (or
perhaps one software) that have a specific header format as:

Received:
Received:
Received:
Message-Id:
Received:

Is it possible to construct a scoring recipe that would identify messages
with this header format?  Is a message that seems to have a Received: out
of place indicative of spam, or badly formed, or not abnormal at all?

You could use scoring if you wanted. Scoring is a means to an end - matching the above as-is certainly doesn't mandate using scoring to achieve it.

I've used the following recipe for a long time:

# Received headers AFTER other headers...
:0
* ADVISORIES ?? on
* ^(From|Date|Subject|Reply-To):(.*$)+Received:
{
        # some lists though...
        SPAMVAL="+25"
        SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"
SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} Advisory - Interspersed Received: headers${NL}"
}


I should warn you that while this used to be pretty consistently spam, some mailing lists, which insert headers, have a tendancy to trip it. In my case, I'm only adding about 10% of my threshold for it, so it doesn't result in false-positives unless there are a number of more significant factors already.

Looking back about three weeks, I see a couple dozen list trips, and only about three actual spams (all of which were scored as seriously spammy even without this).


Note my recipe above (which morphed from a discussion on this list two or three years ago or more), does not use Message-Id. In fact, of all the headers to include, that would be a bad one: too many systems (or client hosts) emit messages without their own messageid, necessitating the receiving or intermediate) host to insert their own to make the message RFC-compliant. Now, while I check for non-users sending messages to MY mail host for which my mailhost must insert a message-ID, I know better than to flag some message submitted by a user to their own ISP mailhost as spammy just because that users host didn't include a message-ID and their ISP had to insert one.

There's no RFC which declares that Received headers must appear before others.

Some mail lists strip Received: headers from before submission to the list (either for privacy or to reduce retransmission overhead - 1KB times of extra cruft times 10,000 subscribers adds up - just look at the headers on the procmail list messages!), which causes issues with some filters -- I've got filters that check to see if hosts associated with a sender's freemail domain (hotmail, yahoo, etc) appear in the Received: chain, but when someone uses one of these services to mail a list which strips headers, the condition gets tripped (to combat this, and some other recipes which are more prone to tripping on lists, I grant lists an "allowance" for spam scoring).

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail