Someone might want to tinker with this - I'm just doing some evaluation of
the process, but there's a number of limitations (most of which I don't
think completely cripple this, so long as one uses a spammishness approach
and doesn't set the scoring too high for this).
Pros:
corporate users generally identified ok
users of large ISPs generally identified ok
LOTS of spew fairly consistently tagged
Cons:
People who have their own domain, but don't actually use a dedicated
host for it (i.e. neither their own system or their mailhost actually
bear the name of their own domain) show as false pozzie. You know
who you are.
People posting through lists which purge Received: headers from prior
to list delivery (IMO, a bad practice, since it reduces downline
abuse management) will be flagged false pozzie.
Involves external process incl DNS lookup just to condense the
host to the base domain component (yea, not even a DNSBL lookup, just
a host parse).
Relies upon the syntax of the external process. An upgrade to bind
could change the format (as it has in the past).
ISPs operating under multiple aliases (rcn.com + rcn.net) but using
mailhosts specific to the one domain.
BASEHOST=`host -v -t SOA $FROM_DOMAIN`
:0
* BASEHOST ?? ^\/[-_.a-z0-9]*\.[ ].*[ ]SOA[ ]
* MATCH ?? ^\/[-_.a-z0-9]*[^ .]
{
BASEHOST=$MATCH
}
:0E
{
BASEHOST
}
# yea, easy to forge, but we check anyway.
:0
* ! BASEHOST ?? ^^^^
* 1^0
* $ -1^0 Received:.*\>$\BASEHOST\>
{
SPAMVAL="+75"
SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"
SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} from_domain not present in
Received chain${NL}"
}
FROM_DOMAIN gets set in my standard header extractions (which you can find
in my sandbox).
Is anyone aware of a more streamlined method of resolving a host down to
its base domain components?
Basically, I recognize that the host+domain portion of an email address may
not be STRICTLY a domain, and may not itself appear in message headers
("list.nessus.org" is an example - the server their messages roll around on
is "mail.nessus.org"). I'm using the host and subsequent parsing recipe to
obtain the base domain to which a host belongs - host doesn't seem to want
to return the SOA for a query at a non-domain level, necessitating the -v
option which returns a lot more cruft, which in turn requires the parsing
recipe to weed out the crap and actually find the SOA line, leaving us with
just the domain portion which we can then look for in the headers.
Before someone and suggests I simply grab the last two dot-separated tokens
from the domain, please bear in mind that non-US domains often have
domain.co.tld type syntax, and that's not consistent amongst all two-letter
tlds.
As a future enhancement, I figure that the recipe _could_ obtain the MX's
for a domain and attempt to check for those hosts/domains in the headers
(yes, outbound mail hosts are not necessarily the same as inbound ones -
but there's a good chance that if "putzwald.com" uses "mail.earthlink.net"
as an MX, when that domain is legitimatley sending mail, it may be going
through an earthlink.net host). As a different example, "rcn.com"
apparently really uses "rcn.net" for mail.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail