On Mon, Apr 30, 2007 at 12:41:18PM +0200, Ruud H.G. van Tol wrote:
Dallman Ross schreef:
:0 B:
* $! [^$WS][$WS]+[^$WS]
ONEWORD
How
about
a
funky
body
that
looks
like
this
?
--<<<--==+X[2266df7a330c921e0d416be1cc667191]
Content-Type:text/plain;charset="iso-8859-1";format=flowed
Content-Transfer-Encoding:7bit
[snip similar]
I already explicitly acknowledged those would be caught, Ruud.
It's not a surprise vis-a-vis my coding intentions. Michelle
doesn't want spam. I would argue that 99.999% of any such body
content will be spam. It's a lagniappe, in other words.
Last night I wrote (when I added in the missing "NOT"):
I didn't bother to add line breaks, because if
a message is just one word (or none) per line
over more than one lines and then ends, it's also
highly suspect.
That is exactly what you just showed.
And this morning I added in followup:
One other thing to realize about this, which I thought of last
night but didn't mention, is that it would also capture some
non-Western-charset messages that have only one part (i.e., not
multipart messages). If they don't have any spaces in the body,
they will be caught. Running the recipe against a couple hundred
of my most recent spam messages, I catch three Japanese-charset
messages. Depending on what you want, that might or might not be
considered a boon or lagniappe. To avoid that, well, put an
appropriate limiting condition in the recipe.
So I anticipated your "objection" already, in other words. If the
person using the algorithm knows what it will do and wants just
that result, then it's not a problem. Of course, you are right
to show that one needs to think about other consequences to an
algorithm, whether intended or unintended. :-)
Dallman
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail