procmail
[Top] [All Lists]

Re: capturing the OR that succeeds ???

2004-05-14 04:45:18
On Thu, May 13, 2004 at 09:14:57PM -0500, Michael D Schleif wrote:

* Dallman Ross <dman(_at_)nomotek(_dot_)com> [2004:05:14:01:41:30+0200] 
scribed:

On Thu, May 13, 2004 at 11:47:42AM -0500, Michael D Schleif wrote:

Suppose that I have this:

   :0 BD
   * 9876543210^0 aaa
   * 9876543210^0 bbb
   * 9876543210^0 ccc
   * 9876543210^0 ddd
   * 9876543210^0 eee
   {
      :0 fhw
      | formail -I "X-Procmail: alphabet soup"
   
      :0 A
      alphabet/soup
   }


However, now I want to identify _which_ condition was satisfied, and
plug that into the X-Procmail line:

   | formail -I "X-Procmail: alphabet soup: $MATCH"


Use the match token, '\/', before each regex.  Since you start the
[snip]

However, you're still running body greps up to five times on "hit" messages
and all of five times on non-triggering messages.  You can have the same
effect with one pass like so:

  MATCH
  :0 D fw
  * B ?? ()\/(aaa|bbb|ccc|ddd|eee)
  | formail -I "X-Procmail: alphabet soup: $MATCH"

This has also avoided the unnecessary extra recipe you have above.

Could this be an improvement?

      MATCH
      :0 BD
      * 9876543210^0 aaa
      * 9876543210^0 bbb
      * 9876543210^0 ccc
      * 9876543210^0 ddd
      * 9876543210^0 eee
      {
          :0 D fhw
          * B ?? ()\/(aaa|bbb|ccc|ddd|eee)
          | formail -I "X-Procmail: alphabet soup: $MATCH"
      
          :0 A
          alphabet/soup
      }

No, it's worse.  :-)  Now you're body-grepping everything at least twice,
possibly up to six times, for no new reason.

Won't this avoid superfluous condition checks, at least in the
overwhelming majority of cases where *NONE* of the OR'd conditions
succeed?

If most of the mail has none of the regexes, then most of the mail, with
full body, gets passed to procmail's internal egrep five times before
leaving the above recipe.  It is not particularly efficient, no.
And your second draft is not an improvement.

Also, your last line suggests, "avoided the unnecessary extra recipe"
-- what do you mean?  I want to send *ALL* messages that meet any of
the OR'd conditions to `alphabet/soup'.

I think you didn't understand, or at least didn't see, part of the 
syntax change suggested.  You originally had a case-sensitive ("D"
flag on the initiation line) body-only condition check ("B" on the
initiation line).  After you "found your man," so to speak, you
had a second recipe, a filtering one ("f", and almost definitionally
also a "w", on the initiation line) to add the X-header.  I combined
those two concepts into one recipe.

We can turn it into one recipe by removing the limitation to a body-
only condition for the entire (first) recipe.  Then we tell the
relevant condition(s) to act on the body, however.  That's what this
syntax does:

  * B ?? regex

So now we can have some condition lines that operate on the body,
while letting possible other selected conditions operate on the
headers (default); and while letting the recipe's action act on
the entire message.

Moreover, I wanted the action line to operate only on the message
headers!  So I used the "h" flag up-top.  We're only adding an X-header
to the existing header-set, after all.

Remember that "H" and "B" flags concern how procmail acts on *conditions*
(with "H" being the default); while "h" and "b" flags concern how
procmail acts on the *action*-line (with both together being the default).

Since you've stated in another follow-up message that you do care
about order, then David Tamkin's remarks are pertinent, and you might want
something like (I hope you are using a monospace font to view this!):

 # case-sensitive parsing
 #   | filtering action ("function-box pass-through")
 #   | |
 #   | |   (while we wait for the filter action to finish)
 #   | |  /
 #   | | /  and the action line will concern only the headers!
 #   | || /
  :0 D fw h
  * B ?? 9876543210^0 ()\/aaa
  * B ?? 9876543210^0 ()\/bbb
  * B ?? 9876543210^0 ()\/ccc
  | formail -I "X-Procmail: alphabet soup: $MATCH" 

      :0 A
      alphabet/soup

Remember that, since you say most mail won't match, you're
going to subject most mail to all your body-grep conditions.
You can try to avoid that with various algorithmic choices.
You could pre-select for only messages that contain attachments,
and that would be the first thing I would do.  (In fact, that's
what I do do in Virus Snaggers.)  Now we can use our handy

   B ?? regex

condition lines mixed with another condition line that operates
on the message headers only:

  :0 D fw h
  * ^Content-Type:.*(multi|attach)
  * B ?? 9876543210^0 ()\/aaa
  * B ?? 9876543210^0 ()\/bbb
  * B ?? 9876543210^0 ()\/ccc
  | formail -I "X-Procmail: alphabet soup: $MATCH" 

Since a condition's default is to operate on the headers only, we didn't
need to write that first condition verbosely, though we could have:


  * H ?? ^Content-Type:.*(multi|attach)

(But I try not to verbosely state the default, because if I verbosely
state something in my code, my having done so should be indicating to me
that I was doing something *other* than the default.)

Okay, that's enough for this round, I hope.

-- 
dman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail