procmail
[Top] [All Lists]

Hostname regexp (was Re: no longer Re: Spam: ...life style...)

1997-09-30 01:00:58
Talking to myself again:

On Mon, 29 Sep 1997 23:41:06 +0300 (EET DST), I wrote:
On Mon, 29 Sep 1997 11:54:55 -0400 (EDT), rik(_at_)netcom(_dot_)com (Rik 
Kabel)
wrote:
In another followup in this thread, someone presented a recipe for
analyzing certain Received headers. That recipe used a regexp like
[a-z][-a-z0-9_.]* to identify what I believe was meant to be a host
name. There are three problems with this. First, host names are now
The full expression was ([a-z][-a-z0-9_.]*)* which is something
slightly different. The intention is to force the match to contain
+some+ alphabetic characters somewhere. Forcing the first character to

I guess one should probably break down and make that more stringent
for a real valid hostname, and something like ([a-z][-0-9_!(_at_)+]*\(_dot_)?)+
or even more forgiving for a faked hostname.

changed). Finally, this regexp as I have reconstructed it allows
consecutive dots. It might be more accurate, though pedantic and

I note that the faked host name that started this thread actually
contained two consecutive dots. :^)

verbose (my forte), to define a host for this purpose with a regexp
like ([a-z0-9][-a-z0-9]*\.)+[a-z][a-z][a-z]+, as long as it is followed

Meet the .fi domain. Oh, and don't forget the other 300 or so
two-letter TLD:s. OTOH, allowing more than three letters for the TLD
also seems a bit too generous.

How's this: 

([a-z0-9](-?[a-z0-9])+\.)+[a-z0-9][a-z0-9][a-z0-9]?

/* era */

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>