procmail
[Top] [All Lists]

Re: RFC-consistent regexp to match name@(subdomain.)*foo.bar

1997-02-19 23:50:02
Stan Ryckman <stanr(_at_)sunspot(_dot_)tiac(_dot_)net> writes:
At 03:50 PM 2/19/97 -0800, Rob Perelman wrote:
On Wed, 19 Feb 1997, Stan Ryckman wrote:

I think you're asking for something *extremely messy* in a regexp.
Consider you'll have to match:

<some really ugle addresses>

Strange...here's my findings.  This is according to a 4 page regex 
written by Tom Christianson, author of Perl.

TomC is not the author of perl -- Larry Wall is.  Tom is just one of
the formost promoters of it.  Not to be sacrilegious, but if Larry Wall
is the Father, Tom is the Son, and Randal Schwartz is the Holy Ghost.

As for the regexp, there's a small problem: it's impossible to match
all rfc822 addresses with a regular expression.  Why?  Regular
expressions can't match balanced parens to unlimited depth.  To do so
requires a pushdown automata instead of the simple finite automata of a
regular expression engine.  If you check the documentation on that
nasty regexp closely, you'll find a comment to the effect that nested
parens are only recognized to a depth of 2 (I think).  Deeper than that
and they won't be matched correctly.  Don't bother mailing Tom, he
already knows.


I deduce that even 4 pages isn't enough!  :-)

How about aleph-naught pages?

(For those who aren't mathematicans, aleph-naught (as in the hebrew
letter subscripted with a zero) is the first infinite cardinal, being
the cardinality of the integers.  When most people talk of infinity,
this is generally what they mean ("if you count forever...")).


Anyway, the moral to all of this is that you *can't* write a condition
in procmail that matches any email address in the foo.com domain.  However,
you usually don't need to.  sendmail should be giving you a much simpler
address to check in at least two other places: the Received: headers, and
the Return-Path: or "From " non-header.  The hostnames in the former are
straight out of gethostbyaddr(), while the address in the later two is
from the envelope and should be in minimal form.  I would thus suggest
something along the lines of either of the following conditions:

* ^(From |Return-Path:).*@([-a-z0-9]+\.)*foo\.bar([^-a-z0-9.]|$)

* ^Received:.*[^-a-z0-9]foo\.bar([^-a-z0-9.]|$)


Philip Guenther