Re: new spam filtering rule
2005-06-29 10:55:47
At 22:16 2005-06-28 +0100, Alan Clifford wrote:
PSE> not generally receive a lot of mail from .xx, and therefore, that small
PSE> segment of their spam can be better classified as iffy by inclusion
of this
PSE> check. No single check is going to thwart all spam.
I guess you need to use what works for you. But I couldn't help thinking
"Most crime in England is committed by English people but I don't know
many Americans so I better classify them all as iffY" It's the start of
really bad thinking, both morally and internetally.
You can't possibly believe that mail filtering has anything to do with
believing the people who live somewhere are criminals? I'm saying that _I_
don't generally _interract_ with people in those countries via email, and
thus if I receive a message claiming to be from some particular country,
chances are good that it is unsolicited, is someone I don't know, and for
that matter, is forged. Same goes for filtering based on languages one
doesn't speak. As I'm not feeding the messages into an external spam
reporting system, if I happen to miscategorize some chineese BIG5 text as
spam, when in fact it was a legitimate attempt at communication from
someone speaking a different language from me, it simply isn't going to
make a difference to anyone else.
I'm *NOT* advocating blocking at a DNSBL using nerd.dk (though I do use my
own DNSBL to block China, Indonesia, etc - about six or eight country IP
blocks which were at one time responsible for the vast majority of the crap
sent at my servers - still are, but at least my servers aren't taking a
significant bandwidth hit to refuse them), but rather using the
characteristics of the message as part of the overall evaluation of whether
the message is legitimate. At the scoring weight I place on this
attribute, there'd still have to be several other things wrong with the
message to classify it as spam: either a lot (7) minour things such as
this, or something significant.
Hey, I also evaluate posts by the accuracy of their dates - morons who
can't set their system clocks are more likely to be classified as spam than
those who can manage to keep their clock within a 8 hours or so of actual
time. I'm a dateist! OMG!
ObQ: do you use CallerID ? What if you're on a cellphone, and you pay the
airtime charges, even on incoming calls (which is typical here in the
US)? Do you elect to take calls from sources which are not only not
recognized as a friend, but aren't even the least bit local? I pay for way
too many wrong number calls -- people calling my unpublished cellphone
number because 10+ years ago, the number was associated with an estate
lawyer (i.e. has his info on a lot of old legal documents, such as wills) -
who is now in a different areacode, so not answering them and letting them
get ditched at the voicemail is a lot more convenient for me, and well,
doesn't incur an airtime fee (or rather, count against my minutes). Why
should my spam filtering be any different?
Further, it isn't as if I'm saying everyone from country 'xx' is a spammer
- I'm saying that in general, *I* don't do a whole lot of communicating
with people in certain countries - certainly not generally _directly_
(lists are another matter, and the nature of the filter is intended to
minimize that). Actually, I'm an admin on a site for British car
owners/enthusiasts, and several of the admins of that are internationally
based, the website is hosted elsewhere, the physical admin has a 2-letter
tld, and of course, many of the users have 2 letter tlds in their email
addresses - but virtually all of the communication with them is via
discussion lists, not directly. Nor are they (any of the admins, nor much
of the user base) in the various tlds I listed in the "second tier" check,
which is comprised of countries which seem to be popular for forging spam
from for some reason.
And, let me say it again, the test isn't a positive identifier for spam -
it's merely a contributing factor, as are most spam tests. Long ago, I
found that the key to reducing unwanted email isn't to assume that any one
thing means something is definatley spam - it's to take a NUMBER of factors
and evaluate them as a whole. This reduces false positives and increases
the success rate of the filtering process.
For me, a characteristic of a fair number of the messages which have crept
through my defences in recent months has been the use of two letter
TLDs. By adding this test to my collection, those messages which might not
have previously been matched now have a better chance of scoring high
enough to be classified as spam.
[snip]
Oh dear, I have an Ascension Island tld in UK ip space.
Please refer to the final paragraph of my original message.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail
|
|