procmail
[Top] [All Lists]

Re: if To: has a given address and nothing else (LONG!)

2000-01-24 23:19:29
"David W. Tamkin" <dattier(_at_)mcs(_dot_)net> writes:
Here's what I'm trying to figure out.

I want to match unless every address in To: matches a pattern.  That will
probably require a fork of formail -zx if there are more than one To: header,
and it will also require comment stripping or some bullet-proof comment ig-
noring.

The MTA will have rewritten the incoming addresses in @-form and qualified
them if they were unqualified on mail that originated locally.  If I may
assume that nobody is out to break it by putting at-signs into comments,
would this do?

:0h
* ^To:(.+$)To:
tooz=| formail -zx To:
:0E # not important to strip leading whitespace
* ^To:\/.+
{ tooz = $MATCH }

:0
* 9876543210^0 ! ^To:
* 1^1 tooz ?? [^      ]@([^   ]\.)+[^         ]
* $ -1^1 tooz ?? $pattern
action

Hmm, excluding at-signs in comments and quoted-string and route
addresses (e.g., <@stolaf.edu:guenther(_at_)gac(_dot_)edu>) I think it should.
Hmm, I wonder...


<Philip searches through his backpack for a moment, fiddles around some,
pulls out a package labeled "Strategic Nuclear Rcfile Cluster", sets a
timer, then runs>


If you download 
        ftp://ftp.gac.edu/pub/guenther/822rcs.tar.gz

you'll find a set of rcfiles that can perform an rfc822 parse of address
header field values, as well as general rfc822 structured header lexical
analysis.  WARNING: these rcfiles *REQUIRE* procmail version 3.14 or
later due to extensive use of the SWITCHRC variable.

Given them, the above would be done by creating an rcfile (let's call it
"do_address.rc") containing the following:

        :0
        * $${GROUP_END:+!}
        * ! RETURN ?? $ ^^($pattern)^^
        {
            NOTALL = 1
        }

Then in the main rcfile you would put

        NOTALL
        rcdir = $HOME/path/to/directory/with/822rcs
        addressrc = $HOME/path/to/do_address.rc
        REGEXP = "To|Cc"
        eachheaderrc = $rcdir/822address-list

        INCLUDERC = $rcdir/822headers

        :0
        * $${NOTALL:+!}
        action

Nice, eh?


Philip Guenther


FOR THOSE WHO WANT TO KNOW MORE:

Here's the contents of the README file from the 822rcs package:


You'll probably want a copy of rfc822 on-hand when you read the following.
        ftp://nis.nsf.net/documents/rfc/rfc0822.txt

The rcfiles in 822rcs.tar.gz can be broken in a few groups:

LEXICAL ANALYSIS:
        822lex          Get the next lexical token
        822lex-sc       Get the next lexical token, skipping comments
            822qstring          used by 822lex* to handle quoted-strings
            822comment          used by 822lex* to handle comments
            822domain-literal   used by 822lex* to handle domain-literals

822lex and 822lex-sc remove the next token from the TEXT variable,
returning its type in TOKEN (one of atom, quoted-string, domain-literal,
(for 822lex) comment, or a literal special characters ('@', ':', etc)).
For atoms, quoted-strings, and domain-literals, the variables VALUE
and QVALUE contain the 'semantic' and 'quoted' values of the token.
Those are identical for atoms, but for quoted-strings and domain-literals
the semantic value loses escaping backslashes, while the quoted value
leaves _necessary_ ones in.  For example, the semantic value of
        "foo\"(\)"
is
        foo"()
while the quoted value is
        foo\"()

If UNLEX is set, 822lex and 822lex-sc will unset it and return
immeadiately without altering TEXT, TOKEN, VALUE, or QVALUE.  This allows
parsing routines to 'push back' a token.

The contents of comments are never saved.


PARSING:
        822phrase               These are named after the semantic
        822dot-atom             categories from rfc822
        822domain
        822addr-spec
        822route
        822mailbox

These all parse the value in the TEXT variable by calling 822lex-sc
repeatedly.  The values returned are saved and built up into the variables
RETURN and QRETURN.

822mailbox also set the DISPLAY_NAME and QDISPLAY_NAME variables to the
phrase that appears before the '<', or nothing in the phraseless form.
For example, in the mailbox
        Philip Guenther <guenther(_at_)gac(_dot_)edu>

The display name is "Philip Guenther".


        822mailbox-list

This parses a comma-separated list of mailboxes.  This would be used
when parsing Reply-To: headers, for instance.  Upon parsing a complete
address, it calls $addressrc as an rcfile (via INCLUDERC) which can then
use RETURN, QRETURN, DISPLAY_NAME, and QDISPLAY_NAME to decide how to
handle the parsed mailbox.


        822address

Group addresses allow for the (surprise!) grouping of mailboxes in address
lists.  They are not allowed for all address header fields; for example,
group addresses are not allowed in the Reply-To: header field.  822address
can parse one group address or mailbox.  It uses 822mailbox-list when
parsing groups and sets GROUP_COMMENT and QGROUP_COMMENT to the comment
that preceeded the colon that started the group.


        822address-list

This parses a comma-separated list of addresses, such as can be found
in the To:, Cc:, and From: header fields.  Upon parsing each address
(included ones nested in a group) it calls $addressrc as an rcfile (via
INCLUDERC).  Furthermore, upon completing a group, it calls $addressrc
again with GROUP_END set to a non-empty value.


For example, if TEXT contained
        recursive rcfile fanatics: Philip Guenther 
<guenther(_at_)gac(_dot_)edu>,
                "David W. Tamkin" <dattier(_at_)mcs(_dot_)net>;,
                procmail-users(_at_)procmail(_dot_)org
then addressrc would be invoked four times with the following variable
values:

GROUP_COMMENT   = recursive rcfile fanatics
QGROUP_COMMENT  = recursive rcfile fanatics
DISPLAY_NAME    = Philip Guenther
QDISPLAY_NAME   = Philip Guenther
RETURN          = guenther(_at_)gac(_dot_)edu
QRETURN         = guenther(_at_)gac(_dot_)edu

GROUP_COMMENT   = recursive rcfile fanatics
QGROUP_COMMENT  = recursive rcfile fanatics
DISPLAY_NAME    = David W. Tamkin
QDISPLAY_NAME   = "David W. Tamkin"
RETURN          = dattier(_at_)mcs(_dot_)net
QRETURN         = dattier(_at_)mcs(_dot_)net

GROUP_COMMENT   = recursive rcfile fanatics
QGROUP_COMMENT  = recursive rcfile fanatics
GROU_END        = 1

GROUP_COMMENT   =
QGROUP_COMMENT  =
DISPLAY_NAME    =
QDISPLAY_NAME   =
RETURN          = procmail-users(_at_)procmail(_dot_)org
QRETURN         = procmail-users(_at_)procmail(_dot_)org




HEADER ANALYSIS:

        822headers
            822each-header

822headers calls the rcfile specified by the eachheaderrc variable
for each header field in the current message that matches the regexp in
REGEXP (include neither the colon nor the '^').  HEADER will be set to the
name of header field involved (e.g., "To") while TEXT will be set to the
header field value.  You can thus just set the eachheaderrc variable to
"/path/to/822address-list" in most cases.

822each-header is a helper file to handle the recursion for 822headers.


ERRORS:

If a lexical or syntactic error occurs, the various rcfiles set the
ERROR variable to a description of the error and then do a SWITCHRC
to $ERRORRC.  No high level error recovery is attempted right now, and
there's no easy way to abort the parsing.  'Returning' from $ERRORRC may
result in it being called again immeadiately, or in control returning to
some intermediate level, so don't do that for now.  About the only useful
(?) thing you can do on an error is give up and invoke a delivering recipe
(or set HOST to kill procmail).


NOTES:

With one exception, these rcfile perform a 'loose' parse, accepting
forms that are considered obsolete and even accepting one illegal form
(unquoted periods in phrases).  The exception is that it doesn't accept
mixtures of domain-literals and atoms for the domain part of an addr-spec
(rfc822 allows for "guenther(_at_)gac(_dot_)[138(_dot_)236](_dot_)edu") as 
that's now universally
considered a thinko in the standard.  In fact, these rcfile effectively
parse the complete syntax (including the 'obsolete' forms) given in
the internet draft produced by the DRUMS working group that may some
day replace rfc822.  Let us all pray for that to happen sooner rather
than later.

It is expected that these rcfiles will be significantly slower than a
simpler (those not truely correct) regexp.  As such I would recomend
that they *not* be used unless you have a specific need for a full
rfc822 parse.  If you think you do, you probably don't.  I wrote these
because I wondered if I could, not because I wanted to use them...

These rcfiles *REQUIRE* procmail version 3.14 or later due to extensive
use of the SWITCHRC variable.

Most of the rcfiles included expect the variable "_rcfileprefix" to
be set to the path to the directory containing all the rcfiles, plus
"/822".  The 822headers rcfile will set that variable to the correct
value, assuming the other rcfile are in the directory as it is.  If you
want to call any of the other rcfiles directly you'll need to set that
variable before doing so.

<Prev in Thread] Current Thread [Next in Thread>