procmail
[Top] [All Lists]

searching all headers but one

1997-08-02 11:06:00
When areray(_at_)io(_dot_)com asked,

| > How would you write a condition like that, i.e., if that phrase is the
| > Subject:, that's okay, but /dev/null the mail if it's in any other
| > header?

process(_at_)qz(_dot_)little-neck(_dot_)ny(_dot_)us suggested,

| :0:
| * for name removal information
| * ! ^Subject:.* for name removal information
| spam-mail
| 
| Is pretty good. It fails for the text both in the subject and in the
| other headers. A scoring recipe could fix that, here is my untested
| guess at what such a recipe might look like:
| 
| :0:
| * 4^1 for name removal information
| * -2^1 ! ^Subject:.* for name removal information
| spam-mail

Well, not quite.  If the phrase appears only in the subject, that recipe
will score 4: 4 on the first condition for its being in the head somewhere
and 0 on the second one for not matching the not-in-Subject test.  (The
exponent parameter on a *negated* regexp search condition is meaningless;
absence of any match to the pattern throughout the search area can occur
only once.)  The net positive score would then get the message round-filed.

  :0
  * ^Subject:\/.*
  { SUBJECT_CONTENTS=$MATCH }

  :0:
  * 1^1 for name removal information
  * -1^1 SUBJECT_CONTENTS ?? for name removal information
  spam-mail

is one way to do it.  It scores 1 for each appearance of "for name removal
information" in the head but then subtracts one for each appearance in the
subject line.  If the remainder is greater than zero, the phrase must have
shown up somewhere else in the head.

Now, some of you may be wondering why I extracted the subject contents into
a variable instead of just doing this:

  :0:
  * 1^1 for name removal information
  * -1^1 ^Subject:.*for name removal information
  spam-mail

The reason was the off-chance that "for name removal information" might occur
twice or more in the subject but nowhere else in the head.  This last recipe
would count every appearance in the subject for the first condition but sub-
tract only one for the second condition.  Actually, the problem could be re-
solved more simply by anchoring *both* expressions (and thus we'd be counting
lines that contain the phrase rather than appearances of the phrase):

  :0:
  * 1^1 ^.*for name removal information
  * -1^1 ^Subject:.*for name removal information
  spam-mail

The weight on the second condition is still -1^1, not -1^0, because there
may be more than one subject header.

<Prev in Thread] Current Thread [Next in Thread>