At 13:11 2003-02-01 -0500, fleet(_at_)teachout(_dot_)org did say:
Received: from tatanka (localhost [127.0.0.1])
Received: from pxhkieb (12-232-227-51.client.attbi.com [12.232.227.51])
Received: from muhirhw (www.phila.gov [170.115.249.20] (may be forged))
tatanka is (probably?) meaningful;
Such irony since it is a localhost, rather than a routed IP, and thus isn't
representing a hostname on the internet itself.
whereas pxhkieb and muhirhw would appear (to me) to be gibberish and
designed to plug "something" into a required field or to disguise something.
Giberish to you may be legitimate in another language. Or, for that
matter, just a series of letters and digits representing a hostname
(Windows 2000 does some goofy stuff for default hostnaming for instance)
using some representative code unique to the organzation where the host is.
Dictionary checks will be difficult due to the many languages which could
be used. Don't forget proper names.
I suppose it's "life experience" (or something); but "tatanka" appears to
be "ok" (with minor reservations) whereas "pxhkieb" and "muhirhw" are (to
me) immediately suspect. How does one describe this in code?
You could probably use character classes which define that say that runs of
more than two consonants are suspect. Look at the distribution of vowels
and consonants within English words and you might note that pattern.
I think it'd be a lot of work to achieve and would still give false hits,
both with non-english language messages as well as encoded hostnames.
# ... and sometimes y, so let's just include it.
VOWEL=[aeiouy]
CONSONANT=[bcdfghjklmnpqrstvwxz]
:0:
* $ ^Received:[ ]*from[ ]*\
\/[-\._a-z0-9]*${CONSONANT}${CONSONANT}${CONSONANT}[^ ]*
suspect.mbx
(the trailing inversion of whitespace is intended to allow the match
operator to catch just the from hostname portion, so your log, if you're
running verbose, will show you what matched).
I ran a scan against some miscellanious messages (NOT spam) revealed many
exceptions:
Some exceptions:
apple
ultra
sepulchre
bftoemail (used by bigfoot)
uprrsmtp1 (gaak, smtp alone trips it, several other
hosts trip the same thing)
dws.disney.com (the 'dws' part)
arachna
dbn.net
bestweb.net
mta2.rcsntx.swbell.net
planetworks.com
biz1.mailsrvcs.net
mail.starbright-direct.com
rs2s3.datacenter.cha.cantv.net
(etc)
So, that test isn't going to go very far.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail