On Fri, May 28, 2004 at 09:07:30PM -0700, Jim Osborn wrote:
When the spammer is in control of the string, then, there's no way to
count "wild teens" as two words with, say, (\<wild\>|\<teens?\>)
without risking counting "wilderness in thirteen" as two bad words
if I remove the word delimiters. Hmmmmm....
If you didn't find a solution, I'm sure not likely to! :)
Hi, Jim,
As for not finding a solution, I think -- but really, I only
skimmed the previous thread, so I hope I am restating this
correctly -- that what David didn't find a solution to was
merely an efficient way to write a regex that overlaps words.
I don't think that exists. I have looked at the animal
before too. Please note qualifying adjective, "efficient";
by which we mean here, _without having to repeat the overlapped
word in the regex_. It's merely a point of arcane expression
I think David was talking about, but not something to stop you
from writing a recipe to do exactly what you want.
IOW, I am sure David has no problem constructing a recipe
algorithm to do what you have in mind. It is eminently doable
to write, in procmail condition form, a rule that we can
imagine in our heads. :-)
You want, I think, "wild" plus one or more whitespace OR
newline OR whitespace+newline, plus "teen"; with the two
words bounded by procmail word delimiters. Right?
Assuming you have whitespace defined as a space and a tab
and which we'll call "$WS":
:0 B:
* $ 9876543210^0 ()\<(wild[$WS]+teens\>
* $ 9876543210^0 ()\<(wild[$WS]*$+[$WS]*teens\>
wildteens
should be it.
(Not tested.)
P.S. You're not the Jim Osborn I knew in Heidelberg in the late-
middle ninetees, right?
--
dman
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail