procmail
[Top] [All Lists]

Re: Matching code(I added a second lime to this subject just to prove it can be dome)

1997-06-15 08:33:00
At 12:49 PM 6/15/97 +0300, era eriksson wrote:
On Sun, 15 Jun 1997 03:56:25 -0400 (EDT),
Brock Rozen <brozen(_at_)webdreams(_dot_)com> wrote:
Using the code below and the list of words below (that is
match.word-reject), would it match if the word "add" and "me" were located
ANYWHERE in in the message? 
<... example essentially "egrep 'add.*me'" ...>

No; egrep is line-oriented. (And this would not be too hard to test, I
believe.)

Obviously, this can create some problems. If anybody has suggestions, I'd
apprecate it. Thanks.

The "canonical" problem is that it will not match if the two words
technically are part of the same field but the field is split across
several lines, like

Subject: This is a rather long Subject: line which contains both the word "add"
      and the word "me", but on different physical lines

Well, procmail concatenates all continued header fields (internally), so
that's not a problem in this example, but it is a problem within the body.

I would expect that this is not a big problem in real life in most
situations.
 On the other hand, I can imagine that "add.*me" would by mistake
match a lot of other things you didn't intend it to ... the typos on
the subject line of this message are intentional ;^)
 You probably want something more like "add[  ]+me". (The brackets
contain a space and a tab. The tab is just paranoia. You could do well
with just "add +me".)

Actually, if the egrep is new enough to understand it, probably
"\<add +me\>" would be better, to avoid picking up things such as
"should we add memorization to our school curriculum?"

The technique is "risky" anyway (because of remaining possible false
matches, including some messages in this thread, I think), but if the
purpose of the filter is just to "weed approximately", then it should
do that job.

Cheers,
Stan

<Prev in Thread] Current Thread [Next in Thread>