Re: debugging \<(xxx|yyy|zzz)\>

On Fri, May 28, 2004 at  6:20:09PM -0500, David W. Tamkin wrote:

What I didn't mean was everything to the right of there, because
even with the typo corrected, it wouldn't have worked anyway.
Forget I suggested it.


Oops!  I threw your suggestion, loosely, into my sandbox and it did
work, so I ran with it.  Now I see that my little change masked the problem.

So, there's no way to put a pair of bounding regexps at the ends
of a list?


Of course there is; you already had it in the regexp you were trying.


That's fine when I'm generating the string.

What there ISN'T is a way to count both of two overlapping matches to a 
regexp, unless the overlap consists of a single newline.

I've run into a very similar problem before, trying to count words where 
there was a chance that two of the words I was looking for might be 
separated by only a space.  I don't remember finding a solution.


When the spammer is in control of the string, then, there's no way to
count "wild teens" as two words with, say, (\<wild\>|\<teens?\>)
without risking counting "wilderness in thirteen" as two bad words
if I remove the word delimiters.  Hmmmmm....

If you didn't find a solution, I'm sure not likely to! :)

I like my spam filters to be conservative, so I can trust that I'm
not tossing Uncle Fluffy's family newsletter out with the other
/dev/null stuff, so maybe I'll just leave the spam word lists
delimited at their ends as they are, and remember to always generate
things internally with these points in mind. 

Thanks again!

Jim

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail