procmail
[Top] [All Lists]

Re: Weird regex Behavior

1997-01-14 13:37:07
James Di Toro asked,

|       I'm trying to filter out the Re: in front of mails and am getting
| problems.  The following recipie wont work:
| 
| :0 f
| * ^Subject: [rR][eE]\:*\/.*
| | formail -I "Subject: $MATCH"

First, you don't need [rR][eE]; without the `D' flag, procmail regexps are
case-insensitive by default.  You also don't need to escape a colon, but it
did no harm.  Third, there's no need to run the body through the filter, so
we can reduce the load by filtering only the head.  So let's simplify a
little and continue from there:

  :0hf
  * ^Subject: re:*\/.*
  | formail -I "Subject: $MATCH"

|       It leaves the ':' in the [new] subject.

Yes, because when there is more than one place to divide the text with \/,
procmail makes the left side as short as possible and the right side as long
as possible.  ":*" means "zero or more colons."  ".*" to the right matches
anything, including a string that begins with a colon (or nothing, if the
subject is just "re" and no more).  So if the text is

  Subject: Revolution begins at 0600

and the condition is

  * ^Subject: re:*\/.*

then MATCH="volution begins at 0600".

If the text is
  
  Subject: re: rest of subject

with the same condition, the shortest match to "re:*" is "re" and the longest
match to ".*" is ": rest of subject".  Yes, "re:" is a match to "re:*" and
" rest of subject" is a match to ".*", but procmail, faced with a choice,
will make the left side as short as it can and the right side as long as it
can.

| Or is the '\:*' getting sucked into the '\/.*' as one regex that goes into
| the MATCH variable.

That's it exactly.

| It's quite anoying because '\:\/.*' works just fine
| and the colon gets taken out and all I'm lef with is the subject.

Right, because the only possible match to ":" (you don't have to write "\:")
is a colon, so the colon in the text is not included in $MATCH.  However,
if you had no asterisk after the colon and then received a subject like this:

  Subject: Re::::: some stuff here

only one colon would stay out of $MATCH, and the other four would get in.

My guess is that you want something like this:

 :0hf
 * ^Subject: re:+ *\/[^ ].*
 | formail -I "Subject: $MATCH"

<Prev in Thread] Current Thread [Next in Thread>