At 11:31 2011-08-18, LuKreme wrote:
LuKreme <kremels(_at_)kreme(_dot_)com> squawked out on Thursday
18-Aug-2011@12:14:46
> So, I started to think (dangerous, I know) and I searched and
found Sean's post from a few of years ago about dealing with
getting domains in domain.co.uk sorts of situations:
And a few minutes later I found Dan's post in the same thread with
(trimmed down to just the part I want)
TLDREGEX = ([cC][oO][.][^.][^.]|[^.]+)
Doesn't need to be case sensitive unless someone explicitly makes a
recipe case sensitive by specifying the 'D' flag. The following is
more succinct, and accomplishes the same thing within the example recipe:
TLDREGEX = (co[.][^.][^.]|[^.]+)
Note that the [.] expression might more commonly be expressed as
\. but one would have to double-escape it to \\. for the slash to
appear in the resulting regexp string, so character classing it is in
fact clearer.
# Get the domain name
:0
* $ FQDN ?? ()\/[^.]+[.]$TLDREGEX^^
* MATCH ?? ^^\/[^.]+
{ DOMPART = $MATCH }
This works perfectly as far as I can tell.
Though the second condition line drops the TLD portion(s) -- this
will grab the "domain" from "domain.tld", "mail.domain.tld", or
"mail.domain.co.uk". However, the TLDREGEX is a 'co.xx' specific
expression -- it'll trip up on something such as 'k12.ca.us' (and
there are many variations on that), but will get "ca" for a domain in
the "k12.ca.us" heirarchy for example (California schools,
http://www.ed-data.k12.ca.us/), and ca.us is used for municipalities
within the state. nv.us is nevada, and predictably, other states use
the same syntax. then there's ca.gov - with a host of subdomains
including some municipalities, and agencies, cdfa.ca.gov, etc.
The UK has "org.uk" and "net.uk" as well.
Admittedly, you're not likely to be ordering anything from .ca.us and
the like, but in the context of parsing out a domain, there are many
issues raised.
Considering the ICANN decision to open up the TLD naming to pretty
much anything, some thought needs to be put into how domains are
parsed - there's sure to be a LOT logic that will break.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail