procmail
[Top] [All Lists]

Re: Counting score program exit code and negation

1997-02-05 13:37:32
Ken Marsh <kmarsh(_at_)charm(_dot_)net> writes:
On Tue, 4 Feb 1997, David W. Tamkin wrote:
| I'd like to count the '@' signs in the header fields, and use
| the results in a score. I've asked this question before, and
| appreciate the help, but previous responses posted here can't do
| this with internal egrep, they only count the number of LINES
| with '@' in them, which is not enough.

Those responses were wrong, Ken, or perhaps you misunderstood them.
Procmail's internal egrep most certainly can count the @-signs.

They were wrong. The original contained:

* 1^0 ^(Resent-)?To:
* 1^1 ^(Resent-)?To:.*,
* 1^0 ^(Resent-)?Cc:
* 1^1 ^(Resent-)?Cc:.*,

To count commas. The left anchor botched the rule. I just assumed 
that the internal egrep only counted lines after playing with it
a little.

That still leaves a problem with the internal Egrep, and that's counting
commas or '@'s in ONE FIELD. (Actually two fields, To: and Cc:) To do
that, I'll still have to resort to an external program like countat. One
might extract a field using formail first, but that still means
launching another external prog!


Hmm, I knew this looked familiar.  I was the one to propose the above
incorrect solution on the 12th of January, at which time David pointed
out to me that it didn't work, so I came up with the basis for the
following correct solution, which I mailed to him and list, but not
you.  Sorry!  I've revised since then to be more compact, but it should
still be understandable.


# set $R to be Resent- iff there are any headers that indicate this message
# as having been resent.  Note that the presense of *any* of these indicates
# that the non-"Resent-" versions should be ignored by the mail system.
# However, you may still want to count them, for this depending on whether
# you want to match messages that have been resent after being sent to
# multiple addresses.
:0
* ^Resent-(From|Date|To|Cc|Message-Id):
{ R="Resent-" }
:0E
{ R= }

# First, offset the count, then clear MATCH.  Try to match a To: header
# (counting a match as one), then add one for each comma in the value,
# clear MATCH again, then try to match a Cc: header (counting a match
# as one), then one again for each comma in the value.  If the total is
# positive then there were more than 19 addresses given, and we bounce
# it using the EXITCODE & HOST hack.  The clearing of MATCH is so that
# if there is no To: or Cc: header, the old $MATCH won't be counted against.
:0
* -19^0
* ^^\/
* 1^0 $ ^${R}To:\/.*
* 1^1 MATCH ?? ,
* ^^\/
* 1^0 $ ^${R}Cc:\/.*
* 1^1 MATCH ?? ,
{ EXITCODE=77 HOST }



| According to the procmail manual, I should be able to run
| an external program and use the return code as a score
| if I "negate" it. It gives no examples (that I can find).

That is true, but you coded it incorrectly.

Yes, I knew that, :)  but there was no example or grammar given! :O
That's why I turned to you all... :(

| I've tried:
| 
| :0 h c
| * -15^0
| * !? /usr/kmarsh/bin/countat

Er, no.  The weights and the condition have to be on the same line, as
Lars Kellogg-Stedman explained.  The way you have it now, it says this:

Is there anywhere in the manual pages that I was supposed to divine
this syntax? I'll admit guilt if it's in there somewhere...


Well, scoring isn't mentioned on the main manpages to avoid confusing
new users.  However, at the top of the procmailsc(5) manpage it says:

     [*] w^x condition

If you think in terms of BNF grammers, then you would say:

condition_line ::= * condition
condition ::= w^x condition
            | ! condition
            | $ condition
            | var ?? condition
            | ? command
            | regexp
            | > number
            | < number


But if you think in BNF grammers, wouldn't the above seem obvious?
Each line is a condition: everything follows from that.


By the way, "h" is meaningless on a recipe whose action line is to launch
a brace nest.  If you want any or all recipes inside the braces to act only
on the head, they need their own h's: flags are not inherited.

The procmailrc man page says that the "H" flag means to egrep only
the header. Since I didn't want the rule acting on the body, and
I had started with the internal egrep, I had the H flag. The body
might have a bunch of @'s in it that I don't want to count.

'H' != 'h'.  The 'h', 'b', 'f', 'i' and 'r' flags are meaningless on
the recipes that opens a nested block.

'H' flag is the default, so unless you also are using the 'B' flag with
it, it's pointless.


Now, if you really want to count all @-signs in the head and allow for
fifteen legit ones (From:, Message-Id:, a few Received:), here's how
you code it without outside programs:

  :0c # Are you sure you want a clone here?    
  * 1^1 @
  * -15^0
  {
   whatever you had in the braces
  }

Wow, so achingly simple... :) Of course, I had to go and write C code...

But wait, it still doesn't allow me to count @'s (or commas) within one
particular header line.

See above.


Note that that counts @-signs, not lines with @-signs.  If you actually had
wanted to count lines with @-signs you could have done that this way:

  * 1^1 ^(_dot_)*(_at_)(_dot_)*$


OK, but can I make sure that it only counts @'s in the header? You
deprecated by use of H, but I think I'll need it to prevent counting
@'s in Mime attachments, for example.

The reason David deprecated the 'H' flags was that it's the default.
If you want to search the body you have to explicitly say so.


Philip Guenther