procmail
[Top] [All Lists]

host query optimization?

2004-05-13 14:10:44
Someone might want to tinker with this - I'm just doing some evaluation of the process, but there's a number of limitations (most of which I don't think completely cripple this, so long as one uses a spammishness approach and doesn't set the scoring too high for this).


Pros:
        corporate users generally identified ok

        users of large ISPs generally identified ok

        LOTS of spew fairly consistently tagged

Cons:
        People who have their own domain, but don't actually use a dedicated
        host for it (i.e. neither their own system or their mailhost actually
        bear the name of their own domain) show as false pozzie.  You know
        who you are.

        People posting through lists which purge Received: headers from prior
        to list delivery (IMO, a bad practice, since it reduces downline
        abuse management) will be flagged false pozzie.

        Involves external process incl DNS lookup just to condense the
        host to the base domain component (yea, not even a DNSBL lookup, just
        a host parse).

        Relies upon the syntax of the external process.  An upgrade to bind
        could change the format (as it has in the past).

        ISPs operating under multiple aliases (rcn.com + rcn.net) but using
        mailhosts specific to the one domain.


BASEHOST=`host -v -t SOA $FROM_DOMAIN`

:0
* BASEHOST ?? ^\/[-_.a-z0-9]*\.[        ].*[    ]SOA[   ]
* MATCH ?? ^\/[-_.a-z0-9]*[^    .]
{
        BASEHOST=$MATCH
}

:0E
{
        BASEHOST
}

# yea, easy to forge, but we check anyway.
:0
* ! BASEHOST ?? ^^^^
* 1^0
* $ -1^0 Received:.*\>$\BASEHOST\>
{
        SPAMVAL="+75"
        SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"
SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} from_domain not present in Received chain${NL}"
}


FROM_DOMAIN gets set in my standard header extractions (which you can find in my sandbox).


Is anyone aware of a more streamlined method of resolving a host down to its base domain components?

Basically, I recognize that the host+domain portion of an email address may not be STRICTLY a domain, and may not itself appear in message headers ("list.nessus.org" is an example - the server their messages roll around on is "mail.nessus.org"). I'm using the host and subsequent parsing recipe to obtain the base domain to which a host belongs - host doesn't seem to want to return the SOA for a query at a non-domain level, necessitating the -v option which returns a lot more cruft, which in turn requires the parsing recipe to weed out the crap and actually find the SOA line, leaving us with just the domain portion which we can then look for in the headers.

Before someone and suggests I simply grab the last two dot-separated tokens from the domain, please bear in mind that non-US domains often have domain.co.tld type syntax, and that's not consistent amongst all two-letter tlds.

As a future enhancement, I figure that the recipe _could_ obtain the MX's for a domain and attempt to check for those hosts/domains in the headers (yes, outbound mail hosts are not necessarily the same as inbound ones - but there's a good chance that if "putzwald.com" uses "mail.earthlink.net" as an MX, when that domain is legitimatley sending mail, it may be going through an earthlink.net host). As a different example, "rcn.com" apparently really uses "rcn.net" for mail.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>