Re: new spam filtering rule

At 22:16 2005-06-28 +0100, Alan Clifford wrote:

PSE> not generally receive a lot of mail from .xx, and therefore, that small

PSE> segment of their spam can be better classified as iffy by inclusionof this

PSE> check.  No single check is going to thwart all spam.

I guess you need to use what works for you.  But I couldn't help thinking
"Most crime in England is committed by English people but I don't know
many Americans so I better classify them all as iffY"  It's the start of
really bad thinking, both morally and internetally.

You can't possibly believe that mail filtering has anything to do withbelieving the people who live somewhere are criminals? I'm saying that _I_don't generally _interract_ with people in those countries via email, andthus if I receive a message claiming to be from some particular country,chances are good that it is unsolicited, is someone I don't know, and forthat matter, is forged. Same goes for filtering based on languages onedoesn't speak. As I'm not feeding the messages into an external spamreporting system, if I happen to miscategorize some chineese BIG5 text asspam, when in fact it was a legitimate attempt at communication fromsomeone speaking a different language from me, it simply isn't going tomake a difference to anyone else.

I'm *NOT* advocating blocking at a DNSBL using nerd.dk (though I do use myown DNSBL to block China, Indonesia, etc - about six or eight country IPblocks which were at one time responsible for the vast majority of the crapsent at my servers - still are, but at least my servers aren't taking asignificant bandwidth hit to refuse them), but rather using thecharacteristics of the message as part of the overall evaluation of whetherthe message is legitimate. At the scoring weight I place on thisattribute, there'd still have to be several other things wrong with themessage to classify it as spam: either a lot (7) minour things such asthis, or something significant.

Hey, I also evaluate posts by the accuracy of their dates - morons whocan't set their system clocks are more likely to be classified as spam thanthose who can manage to keep their clock within a 8 hours or so of actualtime. I'm a dateist! OMG!

ObQ: do you use CallerID ? What if you're on a cellphone, and you pay theairtime charges, even on incoming calls (which is typical here in theUS)? Do you elect to take calls from sources which are not only notrecognized as a friend, but aren't even the least bit local? I pay for waytoo many wrong number calls -- people calling my unpublished cellphonenumber because 10+ years ago, the number was associated with an estatelawyer (i.e. has his info on a lot of old legal documents, such as wills) -who is now in a different areacode, so not answering them and letting themget ditched at the voicemail is a lot more convenient for me, and well,doesn't incur an airtime fee (or rather, count against my minutes). Whyshould my spam filtering be any different?

Further, it isn't as if I'm saying everyone from country 'xx' is a spammer- I'm saying that in general, *I* don't do a whole lot of communicatingwith people in certain countries - certainly not generally _directly_(lists are another matter, and the nature of the filter is intended tominimize that). Actually, I'm an admin on a site for British carowners/enthusiasts, and several of the admins of that are internationallybased, the website is hosted elsewhere, the physical admin has a 2-lettertld, and of course, many of the users have 2 letter tlds in their emailaddresses - but virtually all of the communication with them is viadiscussion lists, not directly. Nor are they (any of the admins, nor muchof the user base) in the various tlds I listed in the "second tier" check,which is comprised of countries which seem to be popular for forging spamfrom for some reason.

And, let me say it again, the test isn't a positive identifier for spam -it's merely a contributing factor, as are most spam tests. Long ago, Ifound that the key to reducing unwanted email isn't to assume that any onething means something is definatley spam - it's to take a NUMBER of factorsand evaluate them as a whole. This reduces false positives and increasesthe success rate of the filtering process.

For me, a characteristic of a fair number of the messages which have creptthrough my defences in recent months has been the use of two letterTLDs. By adding this test to my collection, those messages which might nothave previously been matched now have a better chance of scoring highenough to be classified as spam.


[snip]

Oh dear, I have an Ascension Island tld in UK ip space.


Please refer to the final paragraph of my original message.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail