procmail
[Top] [All Lists]

Re: whitespace regex

2001-10-26 13:22:21

sean -

thanks for your help! however, i just tried your two main suggestions:

* ^Subject:.*      [0-9]+

and

* ^Subject:.*[^ ]*.*      [0-9]

and neither worked (when i send an e-mail with 7+spaces then a number in the subject line, it just sorts it into my inbox as it does regularly). did you test these our on yours before you sent them? are you positive they work on your system? i have a bunch of other filters working so i know procmail is functioning....

thanks so much!!!
-steven


Professional Software Engineering wrote:
At 13:24 2001-10-26 -0400, S. Morgan Friedman wrote:

so i wrote a little procmail command that i hoped would put into the folder all mail that, in the subject line, contained over six-ish white spaces (no

Six in succession, versus a total of six or more.  There's a difference.

:0:
* ^Subject.*([        ]+[0-9]+)
spam

The brackets define a character class, and should contain only one of any given character. When you see things like [ ] in this list, those contain a space and a tab. The plus trailing the character class says "match 1 or more of the previous", so the above would match any messages with a numeric in the subject which followed a space, such as "staff meeting at 2pm".

this didn't work and i then tried variations such as:

[snip] - also incorrect. You should read up on regular expressions. Procmail uses them in the recipes, but they're not some voodoo that was invented as part of procmail.

Have you tried the direct approach:

* ^Subject:.*      [0-9]+

There are six spaces preceeding the numeric class. Now, this would match spaces at the beginning of the subject as well, though the first character of whatever followed would have to be numeric. Since you might not want an abundance of leading spaces to trip your rule, you may want to match ANYTHING EXCEPT a space before matching the spaces:

* ^Subject:.*[^ ]*.*      [0-9]

Also note that the plus which was originally following the numeric class is gone -- since the line isn't anchored to the end, it is unnecessary - we want at least one, and any more is fluff, so if you eliminate the plus, you'll match when you find one, since if theres one, or theren's five, there will be one to match.

This would say "match anything in the subject up to something that ISN'T a space, then match anything at all, six spaces, then a number - followed by whatever (we're not using an EOL anchor to require that the numbers appear at the VERY end of the line, so anything could follow the number -- more numbers, letters, whitespace, whatever).

Of course, SPACE in the above examples is very literally a SPACE. If you wanted to match SPACE or TAB, then you'd replace the individual spaces with individual character classes containing a space and a tab:

* ^Subject:.*[^         ]*.*[   ][      ][      ][      ][      ][      ][0-9]

Some of the bracketed bits will appear wider in this email, and may be translated by your email client to be all spaces, but they are indeed all just a space and a tab.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>