procmail
[Top] [All Lists]

Re: Matching code (I added a second lime to this subject just to prove it can be dome)

1997-06-15 09:07:00
era replied to Brozen:
 > Using the code below and the list of words below (that is
 > match.word-reject), would it match if the word "add" and "me" were located
 > ANYWHERE in in the message? 
<... example essentially "egrep 'add.*me'" ...>
No; egrep is line-oriented. (And this would not be too hard to test, I
believe.)

One could do it with perl. I don't think it is a good idea, so I'll let
you hang yourself on your own.

[multiline headers being a problem, eg:]
Subject: This is a very long Subject: line which contains both the word "add"
      and the word "me", but on different physical lines

I would expect that this is not a big problem in real life in most
situations.

Close to 99% of the mail I get with multiline subjects is spam. The rest
is people showing off stupid mail tricks. The original question had been
about the body, but I'll continue with this for a while.

  On the other hand, I can imagine that "add.*me" would by mistake
match a lot of other things you didn't intend it to ... the typos on
the subject line of this message are intentional ;^)

For those who missed it: {add}[.*]{me}

Minimal matching, such as used by procmail and some/most egreps:
(I {add}[ed a second li]{me} to this subject just to prove it can be dome)

Maximal matching, such as used by capturing regexps:
(I {add}[ed a second lime to this subject just to prove it can be do]{me})

  You probably want something more like "add[ ]+me". (The brackets
contain a space and a tab. The tab is just paranoia. You could do well
with just "add +me".)

Subject: [Mothers-Against-Drunk-Driving-List] How many madd members?

Oops. That doesn't work so well either. RTFM the regexp engine you are
using and then add word boundry checks. Eg procmail:

        * ^Subject:.*\<add +me\>

And perl ('is' flags to more closely resemble procmail):

        m/^Subject:.*\badd +me\b/is;

For matching the body, you can do something like:

        :0B
        * \<add +me\>
        { #Match: do stuff here
        }

Note the total lack of a ".*" in it. Johnbob's perl scoring script
"jmdigest" (see <URL:http://www.io.com/~johnbob/jm/index.html>) or
my rewrite of it (see <URL:http://www.netusa.net/~eli/filtering.html>)
has lots of sample REs for catching scam phrases. Mine is highly
adept at identifying alt.sex.* spam, because I need a filter that
does that. I am still trying to massage it back to handling email as
well as posts. But for just cannibalizing RE ideas it sould prove a
rich source

Elijah
------
Please do not CC me when replying to the list.  It is not my responsibility to
prove to you my mail is not spam, if mail to you bounces it will not be resent.

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Matching code (I added a second lime to this subject just to prove it can be dome), Eli the Bearded <=