procmail
[Top] [All Lists]

Re: Why did it miss?

2002-07-30 07:57:49
On 30 Jul, Martin McCarthy wrote:
| > <p align="Center"><i><font size="-2" color="#ffffff">Under Bill s.1618 TITLE
| >     III passed by the 105th US Congress this letter cannot be<br>
| > 
| > Here is the recipe which I would have thought would catch the above:
| > 
| > :0B:
| > * (section|bill).*(1618|301)
| > /var/mail/junk
| > 
| > Yet, it got through. Am I missing something incredibly obvious?
| 
| I'd guess that it is one of those things that is only obvious when you
| know it.  My guess is that the mail contains an encoded version of the
| text that you quoted and your mail client is decoding it before showing
| it to you.  So the mail doesn't actually contain that text, and hence
| procmail doesn't catch it.
| 
| Of course, there are other possibilities which are better considered if
| you provide the section of your verbose procmail log which shows what is
| happening when (and if) that recipe is being checked.

Two other possibilities that have bit me in the past.

1. Some recipe before this one explicitly delivered the message, so that
it never got to this recipe.

2. The msg text as quoted above is not exactly as in the actual message,
and there is a newline between "Bill"
  and "1618".

(Hmmm... I wonder how many filters the following will trip...)

A similar condition I use looks like [1]:

wsstar='[       ]*'   # defined elsewhere, used below
xlegal="(Th(is|ese)\>+messages?\>+((is|are)\>+)?sent\>+in\>+(accord|\
compli)ance|Este\>+mensaje\>+se\>+envía\>+con\>+la\>+complacencia|\
(new|pending)\>+((Federal|.*Spam)\>).*(bill|l(aw|egislation)|statute)|\
la\>+nueva\>+legislación\>+|S\.?${wsstar}(630|1618)\>|H\.?R\.?${wsstar}\
3113|Unsolicited\>+Electronic\>+Mail\>+Act|www\.spamlaws\.com/us\.html)"

* $ 1^1 ()\/$xlegal

The "\>+" and "\<+" tokens match word boundaries (actually non-word
characters -- i.e. \W and not \b in perl-ese). These also match newline
characters, which "." does not, so they allow matching over multiple
lines.  The "+" modifier makes sure two (or more) word boundaries
between words still match (e.g. spaces).

I first saw such liberal use of word boundaries in a recipe of Era
Eriksson's many years ago. That's not to say he invented|discoverd it,
but simply giving credit where it is due in MY case. In my experience,
they increase efficacy of body searches significantly. 

[1] The variable is not necessary. This is part of a scored recipe with
35 conditions that mix and match (and reuse) 5 variables, and this is
less clutter (for me).

-- 
Reply to list please, or append "8" to "procmail" in address if you must.
Spammers' unrelenting address harvesting forces me to this...reluctantly.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>