Re: Efficient use of OR-matches and $MATCH

Ralph SOBEK <sobek(_at_)irit(_dot_)fr> writes:

I have some questions concerning the regexp matching process, and
would like to do it efficiently since I have 1000-2000 messages per
week which get analyzed by the recipe.  I have looked at Jari's
Procmail resources, but did not find what I wanted.

I have a recipe that starts like this:

:0
* 9876543210^0 $ ^(From|Subject):.*\/\<(${regexps})s?\>.*$
* 9876543210^0 B ?? $ ()\/\<(${regexps})s?\>.*$


Be careful here: \< and \> match actual characters, such that MATCH will
start with whatever the \< matched.  It's also not clear from what follows
whether you really need (or want) to capture into MATCH everything on
the line after the match against $regexp.

{
  EXPR = $MATCH

  ...
}

The variable $regexps consists basically of a huge OR-match of mostly
strings, and a few simple regexps, for example:

      regexps = "fee|fie|foe|fum"

I would like the interior recipe, noted as ... above, to consist of
exclusions, think of it has a huge case statement:

#  case 1
  :0 B
  $* $EXPR ?? "fee"
  $* ! fee fie fum
  {
      OKAY = yes
  }


Those dollar signs are all unneeded (even the one before EXPR), and
the leading ones will keep procmail from recognizing those lines as
conditions.

#  case 2
  :0 B
  $* $EXPR ?? "fie"
  $* ! fie.*foe
  {
      OKAY = yes
  }

#  case 3
  :0 B
  $* $EXPR ?? "fum"
  $* ! foe.*fum
  {
      OKAY = yes
  }

  .... #2

# default
  :0
  {
      OKAY = no
  }


Here, OKAY should be true if at least one case succeeds.  Case 2
should be tried if case 1 is not executed or if it fails.  Does it
suffice to add the `e' flag, creating the follwing start line:

  :0 Be


'e' is for "error", 'E' is for "else".  Make the above
        :0 BE

Does the huge $regexp serve any purpose?  Should I just have `k'
independant recipes?  Would this work?  I currently actually have:


If you can do most (or all) of your nested matches against EXPR, then
it'll probably improve your overall efficiency.  It depends at least
partially on the exact distribution of messages and how they map into
the various possibilities.

:0
* 9876543210^0 $ ^(From|Subject):.*\/\<(${regexps})s?\>.*$
* 9876543210^0 B ?? $ ()\/\<(${regexps})s?\>.*$
{
  EXPR = $MATCH

  <122 * ! ... negative conditions>
}

This is actually overkill, since some conditions depend upon the value
in (or matches) $EXPR.


Without a more detailed description of _what_ you're trying to do,
it's hard to say whether this is the most efficient _how_.


Philip Guenther