procmail
[Top] [All Lists]

Re: Get domain and tld ?

2009-01-24 20:29:30
On Sun, Jan 25, 2009 at 01:35:27AM +0100, Xavier Maillard wrote:
I am struggling with something really simple: have multiple
"match" from a single rule.

I want to get both tld and *last* part of the domain for any
processed email.

For example: list.foobar.com would give TLD=com and
DOMAIN=foobar.

Is there any way to achieve this with one single rule ?

Yes.  Where are you getting the FQDN (fully-qualified domain name)
from?  Still the X-BeenThere field?  (It is often an email address
rather than just a FQDN.)

Anyway, wherever you're getting it from, once you have it in a
variable via the match token as I showed earlier with X-BeenThere,
you can extract whatever part you want.

One thing you will want to clarify or think about is what you want
to do with addresses such as "example.co.uk".  If you take the "uk"
part as the TLD and then extract the next subpart, you end up with
"co"; but I suspect what you really would want is "example".  So
you have to work out your heuristic or logical ruleset for getting
the part you actually want.


I did this in my old Virus Snaggers(tm) code,
for example, from 2004 (http://vsnag.spamless.us).  (And it
all still works to a useful extent today.)
My algorithm was to look at the FQDN, and: (a) if the TLD
was only two letters; AND (b) the FQDN was three or more parts;
AND (c) the penultimate (second-to-last) part was only two letters;
then take the third-from-the-right part as the "domain" part; but
otherwise, take the penultimate part as the "domain" part.
Yes, I realize that sounds complicated.  As I said, you have
to decide what the actual logic should be for what you want to
do.

Just taking the part before the TLD is easy.  Let's presume
it's stored in a var called "FQDN".  Here's my test rcfile:

################# start rcfile #################

  NL = '
' FQDN = $HOST
  LOG  = "FQDN is >$FQDN<$NL"


 :0
 * FQDN  ?? ()\/[^.]+[.][^.]+^^
 * MATCH ?? ^^\/[^.]+
 { DOMAINPART = $MATCH }

 LOG = "DOMAINPART is >$DOMAINPART<$NL"

 HOST  # exit without delivery, for test purposes only

################## end rcfile ##################



Here is the test run:

  2:16am [~/Mail] 592[0]> procmail -m rc < /dev/null

 FQDN is >panix5.panix.com<
 DOMAINPART is >panix<


If you want to do more such as what I described above the
sample code, then you can download vsnag and look at how
I did that part, or you can ask more specifically here again.

The part about getting the TLD from the same FQDN var,
I presume you can copy from that part of the recipe set
I offered earlier in this thread.

Hope that helps,
Dallman
____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>