procmail
[Top] [All Lists]

Re: Get domain and tld ?

2009-01-24 20:41:38
At 01:35 2009-01-25 +0100, Xavier Maillard wrote:
I am struggling with something really simple: have multiple
"match" from a single rule.

Keep struggling.  You need a series for rules.

I want to get both tld and *last* part of the domain for any
processed email.

From where?  A From: header, a Received line, a To:, what?

For example: list.foobar.com would give TLD=com and
DOMAIN=foobar.

Since you're not showing an email address, I'll presume you already have a recipe that assigns that list.foobar.com string to a variable. Let's say that it has been assigned to $FROMDOMAIN by some prior act on your part. For example sake here, I'll deliberatley assign it:

# note the recipe will still work even if this is "foobar.com"
FROMDOMAIN="list.foobar.com"

# first, match the domain down to JUST the rightmost two domain tokens
# (i.e. remove the optional hostname levels).  As parsed here, I'm allowing
# for the FROMDOMAIN to actually be an email address - this will still work.
:0
* FROMDOMAIN ?? [(_at_)(_dot_)]?\/[^(_at_)(_dot_)]+\(_dot_)[^(_dot_)]+$
{
        # preserve the match result - you could repeat the above match
        # instead, but I prefer to do the work once.
        TOPDOMAIN=$MATCH

        # next, get the TLD portion.  You could use TOPDOMAIN here, but I'm
        # demonstrating that because MATCH still contains the result of the
        # prior match, you can use it as the source to match against.
        :0
        * MATCH ?? .*\.\/[^.]+$
        {
                TLD=$MATCH
        }

        # we need to fall back to the saved TOPDOMAIN and get the
        # domain portion (because the recipe above has truncated MATCH).
        :0
        * TOPDOMAIN ?? ^\/[^.]+
        {
                DOMAIN=$MATCH
        }
}


BTW, you do realize that outside of the country-generic TLDs such as .com, .org, .net, .biz, etc, that some country specific TLDs often have their own secondary heirarchy. For example:

        host.demon.co.uk

Your desire to parse this will net you:
        DOMAIN=co
        TLD=uk

Which frankly, won't get you far.

Is there any way to achieve this with one single rule ?

While a series of conditions in a rule could aquire a match and then re-use that match to acquire a subsequent match (as above), the original match is then lost. You must assign the result (within the action portion), and then run a new match condition (as a subsequent rule or a nested rule - see above).

This is (AFAIK) the most concise way to write the above extraction:

* FROMDOMAIN ?? [(_at_)(_dot_)]?\/[^(_at_)(_dot_)]+\(_dot_)[^(_dot_)]+$
* MATCH ?? .*\.\/[^.]+$
{
        TLD=$MATCH
}

* FROMDOMAIN ?? [(_at_)(_dot_)]?\/[^(_at_)(_dot_)]+\(_dot_)[^(_dot_)]+$
* MATCH ?? ^\/[^.]+
{
        DOMAIN=$MATCH
}

However, having two identical initial extractions is icky - if you later determine that you need to update it to deal with some funky variation, you need to remember to do them BOTH. The nested approach doesn't have that problem. The trade off is that the nested approach needs to use an intermediate variable to hold the initial results so that they can be reused.


I must wonder, is this an extension of trying to sort list mail automatically? Have you seen the listname_id.rc ruleset? Search the archives.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>