Okay, this will be wildly unpopular with those in affected countries, but
as I directly correspond with so few people with two letter TLDs, this
makes for a reasonable attribute to check:
Variables used in this recipe (ENVFROM and FROM_DOMAIN) are common
extractions which can be found in the sandbox published at my website.
:0
* ENVFROM ?? ()\.\/..^^
* $ FROM_DOMAIN ?? ()\.$MATCH^^
{
SPAMVAL="+50"
SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"
SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} Envelope sender is a two letter
TLD${NL}"
# continuing, we add MORE spammishness if the TLD matches a list
:0
* MATCH ?? ^^(ru|hu|it|br|uy|pl|pt|za|cl|ch|sk|ua|su|cz|cc|sg|tw|ro)^^
{
SPAMVAL="+50"
SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"
SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} Envelope sender tld is
${MATCH}${NL}"
}
}
I check both the envelope and the From: domain because some lists I use
remail from a two letter tld (er, such as procmail). Further, by
extracting a match on the first one and matching for the SAME tld, I reduce
(though not eliminate) matches where a user of a tld list happens to also
be at a 2 letter tld domain. If the two intersect, yea, they're going to
be flagged, but at least a .uk on a .de list won't. Granted, I'm seeing
plenty of spams where they are using two different domains.
Additionally, in the second (braced) condition level of the recipe, we
optionally match against a list of tlds which are particularly spammy (in
my case, as determined by evaluating my own corpus of spam).
Modify to suit your needs - in my case, since the added score is relatively
low (about 1/5 the total needed to classify a message as spam), it won't
generally matter if the rule hits several messages which aren't spam -
they'll still have to have either several more minor characteristics, or
some strong spam flags in order to be categorized as such and removed from
my inbox stream.
The list of domains are those which have a higher incidence in my own spam
corpus and which I generally don't have correspondants within (though there
are exceptions).
I could perform an initial match like so:
* ENVFROM ?? ()@\/.*\...^^
* $ FROM_DOMAIN ?? ^^$MATCH^^
* FROM_DOMAIN ?? ()\.\/..^^
Which would ensure the envelope and From: domains matched (the entire
domain portions, not just the tld), then would re-match to acquire the tld
as necessary for the second level recipe -- it could be omitted if that
isn't going to be checked - or just moved to that recipe.
Note that because one of my spammishness tests flags based on number of
characteristics matches (i.e. if there's too many characteristics - even
minour, it'll bump it to actual spam), a match at the second level of the
recipe above will provide 2 of 7 flags necessary to consider the message
spam, even if the ultimate score isn't very high.
Comments anyone (besides arguing about specific tlds, which are a matter of
preference)?
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail