procmail
[Top] [All Lists]

Re: new spam filtering rule

2005-06-29 10:55:47
At 22:16 2005-06-28 +0100, Alan Clifford wrote:
PSE> not generally receive a lot of mail from .xx, and therefore, that small
PSE> segment of their spam can be better classified as iffy by inclusion of this
PSE> check.  No single check is going to thwart all spam.

I guess you need to use what works for you.  But I couldn't help thinking
"Most crime in England is committed by English people but I don't know
many Americans so I better classify them all as iffY"  It's the start of
really bad thinking, both morally and internetally.

You can't possibly believe that mail filtering has anything to do with believing the people who live somewhere are criminals? I'm saying that _I_ don't generally _interract_ with people in those countries via email, and thus if I receive a message claiming to be from some particular country, chances are good that it is unsolicited, is someone I don't know, and for that matter, is forged. Same goes for filtering based on languages one doesn't speak. As I'm not feeding the messages into an external spam reporting system, if I happen to miscategorize some chineese BIG5 text as spam, when in fact it was a legitimate attempt at communication from someone speaking a different language from me, it simply isn't going to make a difference to anyone else.

I'm *NOT* advocating blocking at a DNSBL using nerd.dk (though I do use my own DNSBL to block China, Indonesia, etc - about six or eight country IP blocks which were at one time responsible for the vast majority of the crap sent at my servers - still are, but at least my servers aren't taking a significant bandwidth hit to refuse them), but rather using the characteristics of the message as part of the overall evaluation of whether the message is legitimate. At the scoring weight I place on this attribute, there'd still have to be several other things wrong with the message to classify it as spam: either a lot (7) minour things such as this, or something significant.

Hey, I also evaluate posts by the accuracy of their dates - morons who can't set their system clocks are more likely to be classified as spam than those who can manage to keep their clock within a 8 hours or so of actual time. I'm a dateist! OMG!

ObQ: do you use CallerID ? What if you're on a cellphone, and you pay the airtime charges, even on incoming calls (which is typical here in the US)? Do you elect to take calls from sources which are not only not recognized as a friend, but aren't even the least bit local? I pay for way too many wrong number calls -- people calling my unpublished cellphone number because 10+ years ago, the number was associated with an estate lawyer (i.e. has his info on a lot of old legal documents, such as wills) - who is now in a different areacode, so not answering them and letting them get ditched at the voicemail is a lot more convenient for me, and well, doesn't incur an airtime fee (or rather, count against my minutes). Why should my spam filtering be any different?

Further, it isn't as if I'm saying everyone from country 'xx' is a spammer - I'm saying that in general, *I* don't do a whole lot of communicating with people in certain countries - certainly not generally _directly_ (lists are another matter, and the nature of the filter is intended to minimize that). Actually, I'm an admin on a site for British car owners/enthusiasts, and several of the admins of that are internationally based, the website is hosted elsewhere, the physical admin has a 2-letter tld, and of course, many of the users have 2 letter tlds in their email addresses - but virtually all of the communication with them is via discussion lists, not directly. Nor are they (any of the admins, nor much of the user base) in the various tlds I listed in the "second tier" check, which is comprised of countries which seem to be popular for forging spam from for some reason.

And, let me say it again, the test isn't a positive identifier for spam - it's merely a contributing factor, as are most spam tests. Long ago, I found that the key to reducing unwanted email isn't to assume that any one thing means something is definatley spam - it's to take a NUMBER of factors and evaluate them as a whole. This reduces false positives and increases the success rate of the filtering process.

For me, a characteristic of a fair number of the messages which have crept through my defences in recent months has been the use of two letter TLDs. By adding this test to my collection, those messages which might not have previously been matched now have a better chance of scoring high enough to be classified as spam.

[snip]
Oh dear, I have an Ascension Island tld in UK ip space.

Please refer to the final paragraph of my original message.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>