procmail
[Top] [All Lists]

Re: block mail by compairing word in subject

2005-04-28 05:49:28
On Thu, Apr 28, 2005 at 09:39:15PM +1000, Peter Jones wrote:
On Thu, 28 Apr 2005 01:14 am, Dallman Ross wrote:
Since what matches to the right of the `\/' match token is
rightward-greedy, your `+' there is essentially meaningless.

Okay, thanks for that.  This is essentially my first time playing
with '\/' so I've obviously got a bit to learn...

So let's just grab any Subject starting with the first
non-whitespace character, shall we?

  * ^Subject:.*\/[^ ].+

I guess that "+" would also be unnecessary?

Apparently the "+" *is* required!  When I tested my original pattern 
without the trailing "+" ("Subject:[ ? ?]+\/.") it set $MATCH to the 
first letter of the subject line -- which is how I would have expected it 
to behave.  It may be rightward-greedy, but a single "." will still only 
match one character...

That's not what I meant by "rightward-greedy."  (And I hope you
saw my follow-up list mail explaining a bit more about that.)

"Greedy" does not mean "puts the lie to the exact meaning of
the chosen regex." :-)

"Greedy" means -- well, on Halloween eve, the greedy kid grabs
as many M&Ms as his hand can hold when he reaches into the bag.
(Okay, okay, so all kids are greedy.)

Look:

  2:16pm [~/Mail] 297[0]> cat test.rc
 ######################### start ##########################
 
 DEFAULT = /dev/null
 
 SPACE = ' '
 TENSPACES = "$SPACE$SPACE$SPACE$SPACE$SPACE$SPACE$SPACE$SPACE$SPACE$SPACE"
 
 SOMEHEADER = "X-Field-Marker:${TENSPACES}Here is some mixed non-space text."
 
 :0
 * $ SOMEHEADER ?? :$SPACE+\/.*
 { MYMATCH = $MATCH }
 
 LOG = "
 MYMATCH is >$MYMATCH<
 "
 
 :0
 * $ SOMEHEADER ?? :.*\/[^$SPACE].*
 { MYMATCH = $MATCH }
 
 LOG = "
 MYMATCH is >$MYMATCH<
 "
 
 HOST = byebye
 
 ########################## end ###########################
 


  2:16pm [~/Mail] 298[0]> procmail -m VERBOSE=on test.rc < /dev/null
 procmail: [14948] Thu Apr 28 14:16:17 2005
 procmail: Assigning "MAILDIR=."
 procmail: Rcfile: "test.rc"
 procmail: Assigning "DEFAULT=/dev/null"
 procmail: Assigning "SPACE= "
 procmail: Assigning "TENSPACES=          "
 procmail: Assigning "SOMEHEADER=X-Field-Marker:          Here is some mixed 
non-space text."
 procmail: Assigning "MATCH="
 procmail: Assigning "MATCHLEFT="
 procmail: Matched "         Here is some mixed non-space text."
 procmail: Match on ": +\/.*"
 procmail: Assigning "MYMATCH=         Here is some mixed non-space text."
 procmail: Assigning "LOG=
 MYMATCH is >         Here is some mixed non-space text.<
 "
 
 MYMATCH is >         Here is some mixed non-space text.<
 procmail: Assigning "MATCH="
 procmail: Assigning "MATCHLEFT="
 procmail: Matched "Here is some mixed non-space text."
 procmail: Match on ":.*\/[^ ].*"
 procmail: Assigning "MYMATCH=Here is some mixed non-space text."
 procmail: Assigning "LOG=
 MYMATCH is >Here is some mixed non-space text.<
 "
 
 MYMATCH is >Here is some mixed non-space text.<
 procmail: Assigning "HOST=byebye"
 procmail: HOST mismatched "panix5.panix.com"
   Folder:                                                                     0
 


  2:16pm [~/Mail] 299[0]> procmail -m VERBOSE=off test.rc < /dev/null 
 
 MYMATCH is >         Here is some mixed non-space text.<
 
 MYMATCH is >Here is some mixed non-space text.<
 
 

The comments about "greedy" were meant to invoke that if we
are looking for the regex without benefit of the match token,
then [^$SPACE]+ will match only one space, regardless of how
many there are.  We can see what the left-side (non-greedy,
pre-match-token) regex matches, because what *was* matched
is its mirror-image.  Those leading spaces in my experiment
prove that the first regex only matched one space, leaving nine
others of them.


And while, thinking about it, there's actually no particular need to trim 
off all the whitespace from the leading edge of the subject, I do like 

Well, yes, there is.  For example, if we are looking for leading
words such as "RE:" in a subject line, and so on.


your approach and shall go with that -- with the minor revision of 
changing the final "+" to a "*" on the offchance that I actually ever 
want to match a one-letter subject line...  (Practically speaking, it 
will probably never matter.)

You don't need to change it for that reason.  It will match a one-
letter subject.  Let's test it:

  2:19pm [~/Mail] 303[1]> ls test*
 test.rc  test2.rc
 
 
 
  2:19pm [~/Mail] 304[0]> diff test*
 8a9
SOMEHEADER = "X-Field-Marker:${TENSPACES}X"
 
 
 
  2:19pm [~/Mail] 305[1]> procmail -m VERBOSE=off test2.rc < /dev/null
 
 MYMATCH is >         X<
 
 MYMATCH is >X<
 
The regex "+" matches "one or more."  One is enough.  It will
be matched.
 

Given "egrep PATTERN FILE", egrep can determine whether any particular 
line in the FILE matches PATTERN.  What I want to know is whether any 
particular line in the FILE is contained within PATTERN...  I couldn't 
find any egrep option which would do that for me -- hence the PERL 
script...

I'll let someone else tackle that one.


when it did, I might have gone back and searched a little harder;
like I said, though, Michelle's response confirmed what I already
suspected...)

Okay, but I wouldn't use what Michelle does as a litmus test
for what's possible without perl.

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail