procmail
[Top] [All Lists]

Re: One-Word-Spam on nearly all Mailinglists

2007-04-30 03:14:48
On Mon, Apr 30, 2007 at 10:07:02AM +0300, Udi Mottelo wrote:

Delivered-To: dman+nomo(_at_)panix(_dot_)com

Well, okay, but this all seems rather silly to me.  The goal

      It's a dilemma:  Answer directly to the questioner or give
      something more general to open the mind and reminder some
      options with others.  In this example I used the score that
      Michelle used because of using score can be more flexible,
      count and set the number of the words.  Anyway, it is good
      to see as much examples and opinions.

For the record, I wasn't so much calling your answer silly.  I was
responding more to Michelle's original approach.  All those pipes
and calls to programs to solve a simple problem that procmail can
do easily natively with a single condition . . .

And generally, my response to the "dilemma" that you cite is to try
to focus on the problem, not the interpretation or response to the
problem that the original party tries to come up with --
which might well be flawed.

We can always count words, if that's what we want to do, and it
could be interesting to show how, as you did.  I have nothing
against that.  But the stated problem was, "I want to mark as spam
a message with only one word."  We don't need to count words to
do that.[1]  It's good to recognize that all that body stuff can be
painful and wasteful -- especially with the pipes and calls that
were first offered.


[1] Heuristic: "If there is NOT any instance in the body of
                whitespace bordered on each side by non-whitespace,
                we infer that there is no line to be found with
                more than one 'word.'"

    Algorithm (presumes WS has been defined earlier):

                :0 B:
                * $! [^$WS][$WS]+[^$WS]
                ONEWORD
                

One other thing to realize about this, which I thought of last
night but didn't mention, is that it would also capture some
non-Western-charset messages that have only one part (i.e., not
multipart messages).  If they don't have any spaces in the body,
they will be caught.  Running the recipe against a couple hundred
of my most recent spam messages, I catch three Japanese-charset
messages.  Depending on what you want, that might or might not be
considered a boon or lagniappe.  To avoid that, well, put an
appropriate limiting condition in the recipe.

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail