host query optimization?

Someone might want to tinker with this - I'm just doing some evaluation ofthe process, but there's a number of limitations (most of which I don'tthink completely cripple this, so long as one uses a spammishness approachand doesn't set the scoring too high for this).



Pros:
        corporate users generally identified ok

        users of large ISPs generally identified ok

        LOTS of spew fairly consistently tagged

Cons:
        People who have their own domain, but don't actually use a dedicated
        host for it (i.e. neither their own system or their mailhost actually
        bear the name of their own domain) show as false pozzie.  You know
        who you are.

        People posting through lists which purge Received: headers from prior
        to list delivery (IMO, a bad practice, since it reduces downline
        abuse management) will be flagged false pozzie.

        Involves external process incl DNS lookup just to condense the
        host to the base domain component (yea, not even a DNSBL lookup, just
        a host parse).

        Relies upon the syntax of the external process.  An upgrade to bind
        could change the format (as it has in the past).

        ISPs operating under multiple aliases (rcn.com + rcn.net) but using
        mailhosts specific to the one domain.


BASEHOST=`host -v -t SOA $FROM_DOMAIN`

:0
* BASEHOST ?? ^\/[-_.a-z0-9]*\.[        ].*[    ]SOA[   ]
* MATCH ?? ^\/[-_.a-z0-9]*[^    .]
{
        BASEHOST=$MATCH
}

:0E
{
        BASEHOST
}

# yea, easy to forge, but we check anyway.
:0
* ! BASEHOST ?? ^^^^
* 1^0
* $ -1^0 Received:.*\>$\BASEHOST\>
{
        SPAMVAL="+75"
        SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"

SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} from_domain not present inReceived chain${NL}"

FROM_DOMAIN gets set in my standard header extractions (which you can findin my sandbox).

Is anyone aware of a more streamlined method of resolving a host down toits base domain components?

Basically, I recognize that the host+domain portion of an email address maynot be STRICTLY a domain, and may not itself appear in message headers("list.nessus.org" is an example - the server their messages roll around onis "mail.nessus.org"). I'm using the host and subsequent parsing recipe toobtain the base domain to which a host belongs - host doesn't seem to wantto return the SOA for a query at a non-domain level, necessitating the -voption which returns a lot more cruft, which in turn requires the parsingrecipe to weed out the crap and actually find the SOA line, leaving us withjust the domain portion which we can then look for in the headers.

Before someone and suggests I simply grab the last two dot-separated tokensfrom the domain, please bear in mind that non-US domains often havedomain.co.tld type syntax, and that's not consistent amongst all two-lettertlds.

As a future enhancement, I figure that the recipe _could_ obtain the MX'sfor a domain and attempt to check for those hosts/domains in the headers(yes, outbound mail hosts are not necessarily the same as inbound ones -but there's a good chance that if "putzwald.com" uses "mail.earthlink.net"as an MX, when that domain is legitimatley sending mail, it may be goingthrough an earthlink.net host). As a different example, "rcn.com"apparently really uses "rcn.net" for mail.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail