procmail
[Top] [All Lists]

counting addressees

1997-02-05 13:24:22
Ken Marsh wrote,

| On Tue, 4 Feb 1997, David W. Tamkin wrote:

Er, no.  I didn't write the words that followed that attribution.  Ken wrote
them:

| > | I'd like to count the '@' signs in the header fields, and use
| > | the results in a score. I've asked this question before, and
| > | appreciate the help, but previous responses posted here can't do
| > | this with internal egrep, they only count the number of LINES
| > | with '@' in them, which is not enough.

I said this in reply:

| > Those responses were wrong, Ken, or perhaps you misunderstood them.
| > Procmail's internal egrep most certainly can count the @-signs.

And Ken has now answered,

| They were wrong. The original contained:
| 
| * 1^0 ^(Resent-)?To:
| * 1^1 ^(Resent-)?To:.*,
| * 1^0 ^(Resent-)?Cc:
| * 1^1 ^(Resent-)?Cc:.*,

| To count commas. The left anchor botched the rule. I just assumed 
| that the internal egrep only counted lines after playing with it
| a little.

The problem, Ken, was that what you really needed was the number of
addressees, so we were trying to count only the names in such lines.
We didn't want to include the @-signs in places like Message-Id:, From:,
or Return-Path:.  Now you have a new approach: you'll allow fifteen
@-signs to allow for those in other headers and a reasonable number of
addresseees.

| That still leaves a problem with the internal Egrep, and that's counting
| commas or '@'s in ONE FIELD. (Actually two fields, To: and Cc:) To do
| that, I'll still have to resort to an external program like countat. One
| might extract a field using formail first, but that still means
| launching another external prog!

Well, we can do it with formail, though, and you won't need the disk space
for countat.

| > ... you coded it incorrectly.
| 
| Yes, I knew that, :)  but there was no example or grammar given! :O
| That's why I turned to you all... :(

And that is why we have been answering!

| Is there anywhere in the manual pages that I was supposed to divine
| this syntax? I'll admit guilt if it's in there somewhere...

The procmailsc manual shows the weight of a condition and the condition
itself on the same line in all examples.  I do not know where you came
up with the idea of separating them.

| > By the way, "h" is meaningless on a recipe whose action line is to launch
| > a brace nest.  If you want any or all recipes inside the braces to act only
| > on the head, they need their own h's: flags are not inherited.
| 
| The procmailrc man page says that the "H" flag means to egrep only
| the header. Since I didn't want the rule acting on the body, and
| I had started with the internal egrep, I had the H flag. The body
| might have a bunch of @'s in it that I don't want to count.

Stop.  I said "h".  You're saying "H".  They are two entirely different
animals.

"h" means "pipe, save, or filter only the body."  It is meaningless if
the action line is a left brace.  The default is "hb".

"H" means "unless overridden for a specific condition, the search area is
only the head."  "H" is the default; you must specify "HB" or "BH" to
search both head and body.  If you had an "H" on that recipe it would have
been unnecessary (because "H" is the default) but it would not have been
*wrong*.  "h" on a recipe for launching a brace nest is harmless, but be-
cause it shows confusion about what the flags do and where they belong, it
is wrong.  (For example, you might think it feeds only the head to all the
recipes inside the braces.)

| > Now, if you really want to count all @-signs in the head and allow for
| > fifteen legit ones (From:, Message-Id:, a few Received:), here's how
| > you code it without outside programs:
| > 
| >   :0c # Are you sure you want a clone here?    
| >   * 1^1 @
| >   * -15^0
| >   {
| >    whatever you had in the braces
| >   }
| 
| Wow, so achingly simple... :) Of course, I had to go and write C code...
| 
| But wait, it still doesn't allow me to count @'s (or commas) within one
| particular header line.

Right.  I'll get back to that at the bottom.

| > Note that that counts @-signs, not lines with @-signs.  If you actually had
| > wanted to count lines with @-signs you could have done that this way:
| > 
| >   * 1^1 ^(_dot_)*(_at_)(_dot_)*$
| 
| OK, but can I make sure that it only counts @'s in the header? You
| deprecated by use of H, but I think I'll need it to prevent counting
| @'s in Mime attachments, for example.

No, I deprecated the use of "h" in that place.  "H" would be right --
unnecessary to specify, because it is the default, but sometimes you want
to put it there as a reminder to yourself.  You'll notice that I used
neither "h" (because it is meaningless when you're launching a brace nest)
nor "H" (because it is in effect by default unless you use "B" without "H").

Now, there are two other problems with counting @-signs: (1) they will over-
look local addresses if the message originated on your site and (2) even
in To: and Cc: headers, they might appear in the comments on addresses (where
the real name belongs, but where people often put other things).

So here's my suggestion, and sorry that it runs a program, but it's the best
I can think of.  (Unfortunately extraction into $MATCH will not get multiple
headers with the same field name; we can get the first or the last that way
but not both nor any in betweeen.)  We see whether there are Resent- headers,
and if so, save them in a variable; otherwise we save regular addressee
headers there:

 :0h # H is implicit; this is h
 * ^Resent-(To|Cc):
 TARGETS=`formail -czxResent-To: -xResent-Cc:`
 :0Eh
 TARGETS=`formail -czxTo: -xCc: -xApparently-To:`

Note that we used formail's -c option to get single lines from continued
headers.  Now, the number of addressees should be the number of non-empty
lines (procmail always sees an extra empty line at the end of a search area)
plus the number of commas:

 :0
 * 1^1 TARGETS ?? ^.+$
 * 1^1 TARGETS ?? ,
 { ARROWS = $= }

This will still overcount if someone has a comma inside a name comment.