procmail
[Top] [All Lists]

Re: no longer Re: Spam: ...life style...

1997-09-29 13:53:45
On Mon, 29 Sep 1997 11:54:55 -0400 (EDT), rik(_at_)netcom(_dot_)com (Rik Kabel)
wrote:
In another followup in this thread, someone presented a recipe for
analyzing certain Received headers. That recipe used a regexp like
[a-z][-a-z0-9_.]* to identify what I believe was meant to be a host
name. There are three problems with this. First, host names are now

The full expression was ([a-z][-a-z0-9_.]*)* which is something
slightly different. The intention is to force the match to contain
+some+ alphabetic characters somewhere. Forcing the first character to
be alphabetic is a compromise; you could also force it to be the last
one, or allow numbers everywhere (not desirable because I want very
much to not inadvertently match on the IP number which follows), or
write a more complicated expression which will most likely constrain
the possible matches a bit more, which may or may not be good.
  The context was in matching forged domain names, which can and will
contain illegal characters (also including !, as I discovered; I also
augmented the second character class expression with @).

changed). Finally, this regexp as I have reconstructed it allows
consecutive dots. It might be more accurate, though pedantic and
verbose (my forte), to define a host for this purpose with a regexp
like ([a-z0-9][-a-z0-9]*\.)+[a-z][a-z][a-z]+, as long as it is followed
by something else. Of course, if longer TLDs or numerals in TLDS are
allowed, this will have to change as well. Note that for other uses,
localhost may need to be explicitly allowed as well.

These are all good observations for anyone attempting to limit matches
to valid hostnames only. Unfortunately, the forged ones are not always
easy to grab (you can always expect a clever or just incompetent
forger to break yet another one of your assumptions).

/* era */

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>