"Tony L. Svanstrom" <tony(_at_)svanstrom(_dot_)com> wrote:
On Sat, 1 Feb 2003 the voices made fleet(_at_)teachout(_dot_)org write:
Is there any way to differentiate (in procmail) between
a random collection of letters/numbers and "valid"
words/acronyms/abbreviations?
Not using only procmail, but there are several solutions to finding
out if something is just garbage or a name/word; both statistical and
simply= searching a dictionary.
I do some rudimentary checks in procmail. They give me some success.
Here, for example, is one that looks for not-entirely-short From: addresses
that have no vowels or no consonants, which is, well, just weird.
:0 # 021203 () sender's longish local address has no vowels or no consonants
* $ $GO^0 ! LOCALPART ?? [$VOWELS]
* $ $GO^0 ! LOCALPART ?? [$CONSONANTS]
* LOCALPART ?? [a-z]
* LOCALPART ?? .....
{ RX = "${RX:+$RX, }UBE.FR.!(VOWEL|CONSONANT)" }
You need to know that $LOCALPART is a private variable I've set that contains
the local part of the sender's putative address. $GO is an "oversaturated"
supremum value; and $VOWELS and $CONSONANTS, well, should be obvious.
This is a fairly low-hit recipe. It caught four of the last 100 of my
spam messages. But while it obviously has some exposure for false pozzes,
it is surprisingly stable in that regard.
I have some other recipes that are either more experimental or sufficiently
more complex (and ugly) that I don't feel like posting them -- but one
looks, for example, for strings of too many consonants and numbers at the end
of
Subject-lines after extra space, and so on. That one catches a lot!
--
dman
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail