procmail
[Top] [All Lists]

no longer Re: Spam: ...life style...

1997-09-29 08:58:40
Era asked:

(BTW, wouldn't \<+ work better?)
  In my own recipes, I've used "life style" alone as a good sign that

Actually, \<* (or \>*) works better for most cases. I have received
spam in which the spammer has run words together, and this catches
those cases. The only time I might use \<+ is when the run-together
words change the meaning to something I might want to read. For
example, I might want to use "and\>+[0o]ver" instead of "and\>*[0o]ver"
if the enclosing context did not sufficiently distinguish which might
be meant. Again, there is typically no cost, and occasionally some
advantage, to preferring the *d version over the +d.

In another followup in this thread, someone presented a recipe for
analyzing certain Received headers. That recipe used a regexp like
[a-z][-a-z0-9_.]* to identify what I believe was meant to be a host
name. There are three problems with this. First, host names are now
allowed to begin with numerals. Try, for example, a whois lookup
against the InterNIC db for 911.com. Second, I do not believe that
underscores are allowed in host names (but this, too, could have
changed). Finally, this regexp as I have reconstructed it allows
consecutive dots. It might be more accurate, though pedantic and
verbose (my forte), to define a host for this purpose with a regexp
like ([a-z0-9][-a-z0-9]*\.)+[a-z][a-z][a-z]+, as long as it is followed
by something else. Of course, if longer TLDs or numerals in TLDS are
allowed, this will have to change as well. Note that for other uses,
localhost may need to be explicitly allowed as well.

-- 
Rik Kabel          Old enough to be an adult              
rik(_at_)netcom(_dot_)com