procmail
[Top] [All Lists]

Re: whitespace regex

2001-10-26 14:11:19

Thanks, I didn't know about \s.  It doesn't appear to be mentioned in
the man page.

Here's another way to do this.

:0 HcW
*  ! ^\/Subject:.*spam
| /usr/bin/grep '^Subject' | /usr/bin/grep -vqE '^Subject: .*(.)\1\1\1\1\1[^\1]'

:0 efwh:
| formail -A "X-spamtrap: $MATCH"

This unfortunately runs an external program, so it's higher load, but it
does let you take advantage of regexp backreferences in egrep in order
to match subject lines with the same character repeated many times.  I
was getting spam that had high-bit whitespace (i.e. 0xA0) which I couldn't
match in a recipe.  The above rules let me catch it.  I'm doing a similar
one to catch "Received" lines with repeated characters, which are always
from spammers relying on a particular bug in old versions of sendmail.
The details of that recipe are left as an exercise for the reader.  :)


On Fri, Oct 26, 2001 at 02:44:00PM -0600, Brent Loertscher wrote:

Steven,

   I think that this is closer to what you want.  It may need some fine
tuning, but it is a little more precise:

* ^Subject:.*\s\{7,}[0-9]*

\s matches whitespace characters, like space and tabs.

\{7,} means match 7 or more of the previous character, you may want to
change the 7 to something else depending on what you are trying to
match.  This bracket expression is pretty versatile and can be used to
provide an upper and lower limit.  \{,7} is up to seven.  \{1,7} a range
of between 1 and 7 (inclusive).  \{7} means exactly 7.

If the numbers are definitely at the end of the line, you might want to
try

* ^Subject:.*\s\{7,}[0-9]*$

just to make sure that you are matching the final numbers and not anything
else.

Hope that this helps.

Brent



On Fri, 26 Oct 2001, S. Morgan Friedman opined:


sean -

thanks for your help! however, i just tried your two main suggestions:

* ^Subject:.*      [0-9]+

and

* ^Subject:.*[^ ]*.*      [0-9]

and neither worked (when i send an e-mail with 7+spaces then a number in
the subject line, it just sorts it into my inbox as it does regularly). did
you test these our on yours before you sent them? are you positive they
work on your system? i have a bunch of other filters working so i know
procmail is functioning....

thanks so much!!!
-steven


Professional Software Engineering wrote:
At 13:24 2001-10-26 -0400, S. Morgan Friedman wrote:

so i wrote a little procmail command that i hoped would put into the
folder all mail that, in the subject line, contained over six-ish white
spaces (no

Six in succession, versus a total of six or more.  There's a difference.

:0:
* ^Subject.*([        ]+[0-9]+)
spam

The brackets define a character class, and should contain only one of any
given character.  When you see things like [  ] in this list, those
contain a space and a tab.  The plus trailing the character class says
"match 1 or more of the previous", so the above would match any messages
with a numeric in the subject which followed a space, such as "staff
meeting at 2pm".

this didn't work and i then tried variations such as:

[snip] - also incorrect.  You should read up on regular
expressions.  Procmail uses them in the recipes, but they're not some
voodoo that was invented as part of procmail.

Have you tried the direct approach:

* ^Subject:.*      [0-9]+

There are six spaces preceeding the numeric class.  Now, this would match
spaces at the beginning of the subject as well, though the first character
of whatever followed would have to be numeric.  Since you might not want
an abundance of leading spaces to trip your rule, you may want to match
ANYTHING EXCEPT a space before matching the spaces:

* ^Subject:.*[^ ]*.*      [0-9]

Also note that the plus which was originally following the numeric class
is gone -- since the line isn't anchored to the end, it is unnecessary -
we want at least one, and any more is fluff, so if you eliminate the plus,
you'll match when you find one, since if theres one, or theren's five,
there will be one to match.

This would say "match anything in the subject up to something that ISN'T a
space, then match anything at all, six spaces, then a number - followed by
whatever (we're not using an EOL anchor to require that the numbers appear
at the VERY end of the line, so anything could follow the number -- more
numbers, letters, whitespace, whatever).

Of course, SPACE in the above examples is very literally a SPACE.  If you
wanted to match SPACE or TAB, then you'd replace the individual spaces
with individual character classes containing a space and a tab:

* ^Subject:.*[^         ]*.*[   ][      ][      ][      ][      ][      
][0-9]

Some of the bracketed bits will appear wider in this email, and may be
translated by your email client to be all spaces, but they are indeed all
just a space and a tab.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: 
<http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail


-- 
----------------------------------------------------------------------
Brent Loertscher                              |  It is impossible to
Test Engineer, CPG                            |  make things foolproof
Fairchild Semiconductor, Salt Lake            |  because fools are
Internet: cbtlsl(_at_)slcad01(_dot_)fairchildsemi(_dot_)com    |  so 
ingenious.

2001-10-26 14:30


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

-- 
  Paul Chvostek                                             
<paul(_at_)it(_dot_)ca>
  Operations / Development / Abuse / Whatever       vox: +1 416 598-0000
  IT Canada                                            http://www.it.ca/

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>