procmail
[Top] [All Lists]

Re: REGEX

2000-12-18 22:43:55
On 19 Dec, Michael/ION PL Admin wrote:
| 
| I need a regular expression to stop messeges which
| start with 
| ^Subject:
| 
| and have more than 5 spaces followed by some letters.
| 
| In other words, i'm trying to stop messages with subjects simillar to:
| 
| Join For Free!                             zwpdt 
| Great Sex in a Bottle                      lkpie
| Enhance your Sexual Experience  Naturally                    aygtd
| 
| ect..
| 
| I tried variations of:
| 
| ^Subject:.*\s{5,}.*
| 
| without any success
| 

Procmail's regular expression engine doesn't support all the extended
syntax of egrep and perl. (At least not in my version.) You can see
the supported syntax in man procmailrc, search for "Extended regular
expressions".  If you will be happy with the procmail equivalent to what
you've given above, it would look like:

* ^Subject:.*[ ][ ][ ][ ][ ]

where the bracket pairs each enclose a space char. The trailing ".*" is
unnecessary, unless you need it for $MATCH. (I'm assuming the spaces
must be bracketed, or would trailing whitespace be significant in a
condition?  Even if so, the brackets are good for clarity when you're
looking at this recipe months from now.)

But you might want to consider a couple more things. Are you positive
these are spaces, or might there be tabs? If so, you would want
something like:

* ^Subject:.*( |        )( |    )( |    )( |    )( |    )

where there is a space and a tab in each alternation.  Lastly, I
suppose it's unlikely that this would generate false positives, but
part of the unique identification of this spam is the trailing
characters after the white space. And the ones I'm seeing are enclosed
in brackets. So you might want to clean this up further like so:

* ^Subject:.*( |        )( |    )( |    )( |    )( |    )\[[a-z0-9]+\]

which will match your five spaces (or tabs), followed by one or more
alphanumerics enclosed in brackets. If you don't want the brackets you
can just eliminate each of the escaped ones above. If they're optional
(i.e. you want to match with or without the brackets), add a question
mark "?" after the first escaped bracket and eliminate the last. If you
wanted to tighten it up even more, add '( | )*$' (space and tab again)
at the end.

Don


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>