new spam filtering rule

Okay, this will be wildly unpopular with those in affected countries, butas I directly correspond with so few people with two letter TLDs, thismakes for a reasonable attribute to check:

Variables used in this recipe (ENVFROM and FROM_DOMAIN) are commonextractions which can be found in the sandbox published at my website.


:0
* ENVFROM ?? ()\.\/..^^
* $ FROM_DOMAIN ?? ()\.$MATCH^^
{
        SPAMVAL="+50"
        SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"

SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} Envelope sender is a two letterTLD${NL}"


        # continuing, we add MORE spammishness if the TLD matches a list
        :0
        * MATCH ?? ^^(ru|hu|it|br|uy|pl|pt|za|cl|ch|sk|ua|su|cz|cc|sg|tw|ro)^^
        {
                SPAMVAL="+50"
                SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"
                SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} Envelope sender tld is 
${MATCH}${NL}"
        }
}

I check both the envelope and the From: domain because some lists I useremail from a two letter tld (er, such as procmail). Further, byextracting a match on the first one and matching for the SAME tld, I reduce(though not eliminate) matches where a user of a tld list happens to alsobe at a 2 letter tld domain. If the two intersect, yea, they're going tobe flagged, but at least a .uk on a .de list won't. Granted, I'm seeingplenty of spams where they are using two different domains.

Additionally, in the second (braced) condition level of the recipe, weoptionally match against a list of tlds which are particularly spammy (inmy case, as determined by evaluating my own corpus of spam).

Modify to suit your needs - in my case, since the added score is relativelylow (about 1/5 the total needed to classify a message as spam), it won'tgenerally matter if the rule hits several messages which aren't spam -they'll still have to have either several more minor characteristics, orsome strong spam flags in order to be categorized as such and removed frommy inbox stream.

The list of domains are those which have a higher incidence in my own spamcorpus and which I generally don't have correspondants within (though thereare exceptions).



I could perform an initial match like so:

* ENVFROM ?? ()@\/.*\...^^
* $ FROM_DOMAIN ?? ^^$MATCH^^
* FROM_DOMAIN ?? ()\.\/..^^

Which would ensure the envelope and From: domains matched (the entiredomain portions, not just the tld), then would re-match to acquire the tldas necessary for the second level recipe -- it could be omitted if thatisn't going to be checked - or just moved to that recipe.

Note that because one of my spammishness tests flags based on number ofcharacteristics matches (i.e. if there's too many characteristics - evenminour, it'll bump it to actual spam), a match at the second level of therecipe above will provide 2 of 7 flags necessary to consider the messagespam, even if the ultimate score isn't very high.

Comments anyone (besides arguing about specific tlds, which are a matter ofpreference)?

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail