procmail
[Top] [All Lists]

Re: Stopping A Condition Found In A Condition

2001-03-09 23:09:00
At 19:09 -0700 09 Mar 2001, SoloCDM <deedsmis(_at_)aculink(_dot_)net> wrote:
The following is what I have attempted:

First, some comments on your regular expressions (in the style used for
commenting perl regexps).

* ! H ?? ^Subject:(.*|*)([0-9A-Z])?OT([0-9A-Z])?

  ^Subject:
  (.*|*)        # Is this really meant to match a literal `*'?  And even
                #   if it is, why match that separately from the `.*' part?
  ([0-9A-Z])?   # The parentheses are unnecesary, the `?' can be applied
                #   the character class itself.  But, also the `?' on this
                #   following the previous `.*' makes this whole thing
                #   ineffective.
  OT
  ([0-9A-Z])?   # Again, the parentheses are unnecessary.  And again
                #   the whole thing is rather meaningless because it's
                #   optional and at the end of the expression.

So really, the whole thing is basically equivalent to:

* ! H ?? ^Subject:.*OT

unless the lone `*' was supposed to actually mean somthing.

* H ?? ^Subject:\/(.*|*)(\[|\()?(O(/)?T|Off( |-)?Topic)(:| -
|\]|\))?(.*|*)

  ^Subject:\/
  (.*|*)                # Same as above
  (\[|\()?              # This would be clearer and more efficient as a
                        #   character class  `[[(]?'
  (O(/)?T               # Parenthesis around the `/' are unnecessary.
    |Off( |-)?Topic)    # Again, a character class would be better `[ -]?'
  (:| -|\]|\))?         # The `]' isn't special, so it doesn't need to
                        #   be quoted.
  (.*|*)                # Again, same as above

* H ?? ^Subject:\/.*[[(]?(O/?T|Off[ -]?Topic)(:| -|]|\))?.*

The problem is compounded by "ot OR OT" getting ignored if an "ot or
OT" with characters or numbers on either side are in the same line.

This is why it's best to come up with a regexp that will match only the
desired patters, rather than an overbroad regular expression with an
other regexp that's used as a set of exceptions.  The other option is to
use scoring.

I think the following regexp should get pretty close to the desired
results:

* H ?? ^Subject:\/((.*[^a-z0-9]|)OT(]|\)|:| -)|.*(Off[ -]?Topic|O/T)).*

Here it is again in a more readable format:

  ^Subject:\/
  (
      (.*[^a-z0-9]|)       # If there's leading text, it must end with a
                           #   non-alphanumeric character
      OT(]|\)|:| -)        # `OT' followed by `]', `)', `:' or ` -'
    |
      .*                   # Any leading text is OK here
      (Off[ -]?Topic|O/T)  # `O/T' or various things close to `Off-topic'
                           #   Trailing punctuation left out, since it was
                           #   optional.  It'll get picked up by the last
                           #   part.
  )
  .*                       #  Followed by anything

-- 
Aaron Schrab     aaron(_at_)schrab(_dot_)com      http://www.execpc.com/~aarons/
 Though I'll admit readability suffers slightly...   --Larry Wall
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Stopping A Condition Found In A Condition, Aaron Schrab <=