procmail
[Top] [All Lists]

Re: Garbage vs Valid

2003-02-01 12:50:15
At 13:11 2003-02-01 -0500, fleet(_at_)teachout(_dot_)org did say:

Received: from tatanka (localhost [127.0.0.1])
Received: from pxhkieb (12-232-227-51.client.attbi.com [12.232.227.51])
Received: from muhirhw (www.phila.gov [170.115.249.20] (may be forged))

tatanka is (probably?) meaningful;

Such irony since it is a localhost, rather than a routed IP, and thus isn't representing a hostname on the internet itself.

whereas pxhkieb and muhirhw would appear (to me) to be gibberish and designed to plug "something" into a required field or to disguise something.

Giberish to you may be legitimate in another language. Or, for that matter, just a series of letters and digits representing a hostname (Windows 2000 does some goofy stuff for default hostnaming for instance) using some representative code unique to the organzation where the host is.

Dictionary checks will be difficult due to the many languages which could be used. Don't forget proper names.

I suppose it's "life experience" (or something); but "tatanka" appears to
be "ok" (with minor reservations) whereas "pxhkieb" and "muhirhw" are (to
me) immediately suspect.  How does one describe this in code?

You could probably use character classes which define that say that runs of more than two consonants are suspect. Look at the distribution of vowels and consonants within English words and you might note that pattern.

I think it'd be a lot of work to achieve and would still give false hits, both with non-english language messages as well as encoded hostnames.


# ... and sometimes y, so let's just include it.
VOWEL=[aeiouy]
CONSONANT=[bcdfghjklmnpqrstvwxz]

:0:
* $ ^Received:[         ]*from[         ]*\
        \/[-\._a-z0-9]*${CONSONANT}${CONSONANT}${CONSONANT}[^   ]*
suspect.mbx

(the trailing inversion of whitespace is intended to allow the match operator to catch just the from hostname portion, so your log, if you're running verbose, will show you what matched).


I ran a scan against some miscellanious messages (NOT spam) revealed many exceptions:

Some exceptions:
        apple
        ultra
        sepulchre
        bftoemail               (used by bigfoot)
        uprrsmtp1               (gaak, smtp alone trips it, several other
                                hosts trip the same thing)
        dws.disney.com          (the 'dws' part)
        arachna
        dbn.net
        bestweb.net
        mta2.rcsntx.swbell.net
        planetworks.com
        biz1.mailsrvcs.net
        mail.starbright-direct.com
        rs2s3.datacenter.cha.cantv.net

(etc)

So, that test isn't going to go very far.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>