procmail
[Top] [All Lists]

Re: Detecting multiple To: headers?

1999-06-07 11:09:07
There are still serious problems with the recipient counting recipe
attributed to David and reposted by Philip, apparently with a typo found by
Era. I don't know where the problems slipped in. I suspect that Philip went
too far back and pulled out a work-in-progress message instead of the
finished thing.

One problem with the published recipe is in the following recipe. It clearly
does not work if $= is not enough to swing $EXCESS positive in one iteration.
Remember, EXCESS is initially negative.

  # Okay, increment EXCESS by $=.
  :0
  * $ $EXCESS^0
  * $ $=^0
  { EXCESS = $= }

This can be fixed by moving the final assignment to the outside of the curly
braces.


Philip's optimization, too, is faulty. Since EXCESS is negative until we have
too many recipients, the decapitation and recursion should proceed if EXCESS
remains negative, not if it has gone positive. The test, then should be
negated, or should test for the sign, as:

  :0
  * EXCESS ?? ^^-
  ...


Finally, a recent remark by David causes concern about iterating for each
recipient header. Many of the multiple-recipient messages I get have
individual To: headers for each recipient. If you do not properly implement a
bail-out with a low threshold, there will be many recursions. In my tests, I
could sustain about 33 recursions before failure, which could be failure to
obtain a file descriptor, as David indicated. With 200 To: headers and no
bail-out, or no working bailout, or a bail-out only after 50 recipients, this
seems to be inviting failure.


Here is my update. This can be used to get an exact count as it is, or can be
set to bail out of the counting when any iteration causes the count to exceed
the threshold MAXRECIP. It iterates once for each recipient header which has
at least one comma, which significantly reduces the depth of the recursion.
It also takes advantage of knowing that the first header in the slurped
headers is a recipient header with a comma in it.

Excuse the comments on condition lines -- my preprocessor handles those. Also
note that variable names with a trailing underscore represent regexps.
Finally, the trailing .+, on the definition of THDRS_ isn't needed. It only
serves to (maybe slightly) reduce the size of the slurped headers.


############################################### spam ................... RECIP-#
## Count number of recipients. This adds one for each named header, and, if   ##
## any named header has a comma, it use a recursive routine to count them.    ##
## This overcounts if any commas are present in comments, which are allowed.  ##
##   Some folks sum Resent-xx *or* non-Resent-xx headers. This sums 'em all.  ##
##   NOTE: If bail-out is enabled in the called rc, you may remove the last   ##
## three conditions and change RULE report to RECIP-X (RECIPients-eXceeded).  ##
################################################################################
  THDR_=((resent|apparently)-)?(to|b?cc):         ## headers to search
  :0                          ## preload RECIPS with
  * $ 1^1 ^$THDR_             ## the number of recipient headers
  { RECIPS=$= }               ## in the message
  THDRS_=$THDR_.*,(.*$)+.+,   ## remaining comma headers
  :0                          ## if any commas in any recip header,
  * $  ^\/$THDRS_             ##  slurp remaining head to last comma
  { INCLUDERC=$PMRC/commacount.rc }     ##  then apply recursive counter
  :0                          ## compute:
  * $    $RECIPS^0            ## recipient count
  * $ -$MAXRECIP^0            ## less max allowed (+ if toooo many)
  *  -2147483647^0            ## less bigmin (large -, not bigmin, if too many)
  *   2147483647^0            ## add back bigmax if still here
  * $  $MAXRECIP^0            ## and MAXRECIP to make # right
  { RULE="$RULE RECIP-$=" }   ## append this violation to broken rule list


##  This is  commacount.rc  ####################################################
  HEADS=$MATCH                ## save remaining headerset
  :0                          ##  extract first header from remaining headers
  *             HEADS ?? ^\/.+,         ##   through last comma
  *         1^1 MATCH ?? ,    ##  count commas in this matched line
  * $ $RECIPS^0               ##  add previous count, if any
  { RECIPS=$= }               ##  save new count
  :0                          ## now, see if there are more comma THDR lines
## NOTE: Next two lines implement short-circuit bail-out. Enable at will.
##* $ $MAXRECIP^0             ##  but, if we have already exceeded MAXRECIP
##* $  -$RECIPS^0             ##  with counted RECIPS, bail out
  * $             HEADS ?? ^(.*$)+\/$THDRS_       ## more lines after this line
  { INCLUDERC=$_ }            ## if so, curse again!

-- 
Rik Kabel          Old enough to be an adult              
rik(_at_)netcom(_dot_)com