procmail
[Top] [All Lists]

RE: Get domain and tld ?

2009-01-25 13:23:38
I wrote yesterday:

On Sun, Jan 25, 2009 at 01:35:27AM +0100, Xavier Maillard wrote:

I want to get both tld and *last* part of the domain for any
processed email.

One thing you will want to clarify or think about is what you want
to do with addresses such as "example.co.uk".  If you take the "uk"
part as the TLD and then extract the next subpart, you end up with
"co"; but I suspect what you really would want is "example".  So
you have to work out your heuristic or logical ruleset for getting
the part you actually want.


I did this in my old Virus Snaggers(tm) code, for example ...
My algorithm was to look at the FQDN, and: (a) if the TLD
was only two letters; AND (b) the FQDN was three or more parts;
AND (c) the penultimate (second-to-last) part was only two letters;
then take the third-from-the-right part as the "domain" part; but
otherwise, take the penultimate part as the "domain" part.
Yes, I realize that sounds complicated.  As I said, you have
to decide what the actual logic should be for what you want to
do.

[snipped]

Let's refine the procmail syntax given yesterday in light
of the wishes described above.

Here is a recipe-set test harness that includes code to
do what I think it is the OP wants.  I used this rcfile to
test recursively for sample FQDNs that I can provide on
the command line when initiating the testing:


 7:03pm [~] 614[0]> cat rc

##################### start rcfile #####################

 FQDN    = $1           # for testing on the command line

 :0
 * FQDN ?? ^^()^^
 { HOST }  # exit without delivery (lose any mail!) if no arg.
           # Repeating myself: this part is for testing only, not
           # production, because it will not deliver mail fed to it
  

 NL = '
' # define newline variable


 ###############################################
 # THIS IS THE START OF USEFUL PRODUCTION CODE #
 ###############################################

 # find last domain subpart; if country-style format, move
 # left one degree more

 :0
 * FQDN  ?? ()\/[^.]+[.]([^.][^.][.][^.][^.]|[^.][^.][^.]+)^^
 * MATCH ?? ^^\/[^.]+
 { DOMPART = $MATCH }



 # find TLD or country-style-format TLD.
 # Example: "com"; "org"; "co.uk"

 :0
 * $ FQDN ?? $\DOMPART[.]\/.+
 { TLD = $MATCH }
  
 ###############################################
 # THIS IS THE END OF PRODUCTION-CODE SECTION  #
 ###############################################



 LOG = "FQDN is >$FQDN<$NL"
 LOG = "DOMPART is >$DOMPART<$NL"
 LOG = "TLD is >$TLD<$NL"
 LOG = "---$NL"   # log iteration separator


 SHIFT = 1
 SWITCHRC = $_   # recurse


###################### end rcfile ######################


Let's run it.  I called the rcfile "rc", so:

  7:03pm [~] 615[0]> procmail -m rc $HOST foo.bar.$HOST 
jupiter.mars.example.co.uk < /dev/null
 FQDN is >panix5.panix.com<
 DOMPART is >panix<
 TLD is >com<
 ---
 FQDN is >foo.bar.panix5.panix.com<
 DOMPART is >panix<
 TLD is >com<
 ---
 FQDN is >jupiter.mars.example.co.uk<
 DOMPART is >example<
 TLD is >co.uk<


It works fine.  My only question is if the TLD var should be
programmed to show "co.uk" as I have done here, or just left
as "uk" (per yesterday's code sample).

The recipe syntax the OP would be interested in is the part of
the above starting with the comment "find last domain subpart"
and ending with the log entries.

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>