Re: no challenge, but should spammers be blacklisted?

On Wed, Jul 27, 2005 at 09:15:33PM -0400, Chris Payne wrote:

The spam that I see arriving to my organization contains
originating RECEIVED headers originating DOMAIN NAME + ([IP
address]) which do not resolve to one-another.


You are right that it would be a good thing to be able to
check, if it were not a relatively heavy kind of operation
that might bog down an otherwise thin-running procmail
setup.

The thing is, there are countless things one can see about spam
that a trained human eye can nearly instantly recognize as creepy.
The art, in programming against spam, is to choose the signifiers
that are both reasonably consistent and frequent *and* are
not processor-/memory-intensive to flesh out.

Here is sort of a silly example to make a point: spammers
make up names all the time that are obvious nonsense to a
human.  But it still would take enormous programming effort
to "teach" procmail to discern to a satisfactory degree
"real" names from silly ones.[1]

I noticed a spam in my spampile yesterday that had a female
name in the address-part of the From: header, but a male name
in the "comment" area of the field.  For example, something
like this:

   From: "Paul Harvey" <mary(_dot_)smith(_at_)barnard(_dot_)edu>

Well, to a human's eye, that's so obviously bogus that it
raises the suspicion bar at once to the "probably spam"
level.  How do you tell the machine that, though?

That said, I do a fair amount of procmail testing of the lower part
of the Received chain, and it is effective.  One simply has to
realize that some heuristics are good for algorithm "fodder" to be
fed to the machine, while others aren't.  Some are good for
procmail-only, while others would require a "heavy" (lead) :-)
pipe.

One can collect the host name and the IP addresses down there
an pipe out to another program or script.  That's a fair load,
though, in my estimation, and gets pretty far afield from
pure procmail.

Without having to resort to that, I am able to stop nearly
all spam right now, with almost no false negatives and only
a smattering of false pozzes, with procmail alone.  It has
been a well-thought-through project for which I was lucky
to be able to rely on good, heavy past experience writing
procmail code and a few years' worth of studying anti-spam
heuristics that interest me, however.

Unfortunately, there is also the issue that as soon as one
broadly publicizes methods, well, the enemy re-groups and
escalates the war.  Sometimes, altruistic as one might wish
to be, it just doesn't make sense to divulge *all* one's
secrets.

I try to find a good middle ground and share things with this
public list that I think can help, while not giving away every
trick I know to the degree that the whole bagful would soon be
useless to me.  I think interactive education among active advanced
procmailers via a collective effort is important (i.e., the charter
-- unwritten though it might be -- for this list).  Methods and
techniques can be discussed and shared, and are.  But I am a bit
skeptical about a public Holy Grail.

I can only say that the Way Out is to refine one's heuristical
thinking continually, and thereafter one's practicing algorithms to
put that into action.  Spam really is different from non-spam, in
significant ways that elude most brute-force warriors who use
"atomic" material to fight what by rights need be nothing more
than a clever tactical ground war.

I have more to say on the subject, but this is long already.


[1]  I do test for certain far-outlying vowel/consonant string runs,
though, as a decent indicator.

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail