procmail
[Top] [All Lists]

Re: whitespace regex

2001-10-26 15:12:05
On 26 Oct, Brent Loertscher wrote:
| Steven,
| 
|    I think that this is closer to what you want.  It may need some fine
| tuning, but it is a little more precise:
| 
| * ^Subject:.*\s\{7,}[0-9]*
| 
| \s matches whitespace characters, like space and tabs.
| 
| \{7,} means match 7 or more of the previous character, you may want to
| change the 7 to something else depending on what you are trying to
| match.  This bracket expression is pretty versatile and can be used to
| provide an upper and lower limit.  \{,7} is up to seven.  \{1,7} a range
| of between 1 and 7 (inclusive).  \{7} means exactly 7.
|
| If the numbers are definitely at the end of the line, you might want to
| try
| 
| * ^Subject:.*\s\{7,}[0-9]*$
| 
| just to make sure that you are matching the final numbers and not anything
| else.
| 

One small problem. Procmail doesn't grok \s or {n,m}, so this won't
work - at least not the way you want it to.

Sean Straw already gave the answer to the question, though I noticed
there was a followup question so maybe there's still an issue.  FWIW,
what follows goes beyond what was asked, because I don't think the
original observation about the pattern is comprehensive enough. 

The original question was matching "a whole string of black white
spaces, often followed by a few letters or sometimes letters."  I'm
going to take that to mean "blank spaces" and "letters or sometimes
numbers". My experience is this was prevalent some months ago and is
infrequent today. Maybe others' experiences differ. I also don't
recall seeing any without the trailing garbage beyond the whitespace.
When I was seeing a lot of these, they were indeed alpha-numeric (not
either/or), were sometimes enclosed with brackets , and sometimes had a
leading or trailing dash. There was at least one that snuck in some
trailing whitespace.  This is what I use, but first the disclaimer that
it's flagged very few messages since implemented because I haven't seen
many. AFAIK it hasn't missed any.

WS='[   ]'   # a space and a tab inside the brackets
WSx10="$WS$WS$WS$WS$WS$WS$WS$WS$WS$WS"

:0
* $ ^Subject:.*${WSx10}${WSx10}${WSx10}$WS*[[-]?[a-z0-9]+[]-]?$WS*$
{ Insert your chosen spam action here }

[The braces in ${WSx10} are aesthetic only.]

As is, it requires at least 30 space/tab characters. I'm sure that was
arrived at after examining a couple dozen messages, but again this
hasn't been tripped in a while so that number may be inappropriate.
Changing it should be an obvious exercise. It matches (after the
whitespace):

  [[-]?      an optional opening bracket or dash
  [a-z0-9]+  1 or more alpha-numerics
  []-]?      an optional closing bracket or dash
  $WS*$      optional space(s) and/or tab(s) between end of the line


[non-working email address - replace procmail w/ procmail6 if you must.]
-- 
                   /"\
Don Hammond        \ /     ASCII Ribbon Campaign
Raleigh, NC US      X        Against HTML Mail,
                   / \      and News Too

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>