procmail
[Top] [All Lists]

Re: How do I get the message body ???

2002-08-11 16:12:25
From: Dave Kirkby <davek(_at_)medphys(_dot_)ucl(_dot_)ac(_dot_)uk>

I'm not able to follow the syntax of this one:

:0:
* ^From:.*\/\<[a-z0-9=+_-]+(_at_)blow(_dot_)com
* MATCH ?? ^^\/[^(_at_)]+
* $ ? grep -w $MATCH mylist
wasinlist

but  I don't wish to store users at one site, but lots of sites, so they
will not all be in blow.com. The example is so complex I have no hope of
modifying it myself! Sorry.

All right, let's look at it, because it's really not complex once
you know the syntax.

The "\/" has special meaning in procmail.  It's called a MATCH
token.  It is one atom, meaning you con't want to separate the
two slashes with other things.  What is found to the right of
it will be assigned to a variable called $MATCH.  Later, we
can rename $MATCH, if we want or need to, to something less
evanescent, since MATCH's value will be overwritten the next
time it's used, and since it's not a particularly meaningful
name in a mnemonic sense.  I didn't bother with that here,
though.

The \< is another atomic token to procmail.  It means,
essentially, "the left edge of a word."  If in your list
is "poindexter(_at_)example(_dot_)com" and you get mail from another
user, "dexter(_at_)example(_dot_)com", or a third user, 
"er(_at_)example(_dot_)com",
you don't want the match to succeed, right?  You could
instead write the recipe to precede the to-be-matched
string with whitespace (space or tab), but the RFCs
don't require whitespace after the : in the header word.
Anyway, \< is a nice shortcut to delimit word-edges, and
it can even span line breaks.

The stuff inside the vertical brackets is a character class
or range.  I simply included characters that are valid per
RFCs as parts of email addresses.  You don't really need to
be that anal when designing the recipe, though.  Oh: the
hyphen that is part of the class -- and not just setting
a range such as "a-z" -- must be on either end of the
bracketed class.  That's why the last character in my
class is the hyphen.

The + means one-or-more of whatever came before.

There are some procmail-special egrep-type oddities about
the condition syntax normally being "leftward-parsimonious,"
but switching to "leftward-greedy" after the implementation
of the MATCH token; but I don't need to get into that in
this simplified explanation.

Now we can move to the next condition line.  In recent versions
of procmail (which version are you using, btw?), we can
take a second pass (and further passes thereafter) at the
value that was first saved to the MATCH variable.  That's
what we're doing here. 

First, the "??" -- which with relation to variables about
to be tested, allows us to compare them with a string or
regex to the right.  Notice that we don't need a "$" before
the variable placed at the left.  (We would to the right,
were we to use one there, and an almost analogous
situation happens on the next line, but I'll get to that
in a minute.)  I'm tired of scrolling up and back down
while writing this, so I'll reprint part of the recipe
here:

 * MATCH ?? ^^\/[^(_at_)]+
 * $ ? grep -wi $MATCH mylist

Okay, where was I?  Oh: ^^, which is yet another
atomic token and means "beginning of field": the leftmost
character in our MATCH value is where we will start
our comparison.  Next comes another match token, because
we're going to save the results of the new constraint
to the right of the token back into the $MATCH var.

What are we saving?  From the left of whatever we
had already up to a "@" sign.  The local part of
the email address.

You say now, though, that you want the full address,
not just the local part.  Okay: forget this whole line!
There, that was easy, right?  :)

Without the constraint of the now-abandoned line,
we revert to $MATCH's value, which in my example
would be "pete(_at_)blow(_dot_)com" if he was in the From:
line.

Let's go to the next line, with grep in it.  It
technically is still a condition line to procmail.
We're using special test syntax, though.  The other
poster, who offered the formail solution, used this
construct as well.  We're telling procmail with
the question mark to run the statement (probably
in a subshell, unless I'm misremembering something
here, which could be) that follows and look at
the success or failure of its result to decide
on the success or failure of the condition.

Oh, I forgot to explain the "$": it is now
necessary because we are asking procmail to
interpret the value stored in  variable.
(As you know or can at least abstract from the
recipe you first provided, such a syntactical
marker is not needed on action lines.  (Nor,
btw, is it needed on initiating lines; one
could assign "SEA=c" somewhere above and then
have a first recipe line be

        :0 $SEA

to clone, if one wanted to be weird.  I haven't
tested this, but, unless my memory has corrupted 
what I read here once, this should work.  But
I've gone far afield now.)

Okay: the point was that we need the "$" on
condition lines that will expand variables.  That's
why it's there.

Notice that I've added an "i" flag to grep, which
the other poster had in his.  It's the right
idea.  I should also probably use the "x" flag
to grep.  Okay, "grep -xi" instead of "grep -iw".
Good.

Presumably, we found the $MATCH value, which was
the email address pulled from the From: line in
the first condition line in the recipe, within
mylist.  You know what happens on the action
line!

So I'd write your recipe:

  :0:
  * ^From:.*\/\<[a-z0-9=+_-]+(_at_)blow(_dot_)com
  * $ ? grep -ix $MATCH mylist
  wasinlist

The other way looked fine too, but I don't see
the need to run formail and echo on top of grep
here.  The decision is really one of taste, though.

HTH,
Dallman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail