Re: Scoring question

On Thu, Aug 25, 2005 at 12:04:13PM +0300, Udi Mottelo wrote:

[quoting all 89 lines of Dallman's post; Udi, please don't
 do that.  Please trim appropriately.]

On Thu, 25 Aug 2005, Dallman Ross wrote:

On Wed, Aug 24, 2005 at 04:51:51PM -0400, Louis Proyect wrote:

I want to using scoring to filter out spam on the basis of
multiple "to addresses" that include me and anybody else on my
isp. In other words, if mail is addressed to "lnp3(_at_)panix(_dot_)com"
and "xyz(_at_)panix(_dot_)com", it should go into /dev/null.

You want to count To-addresses.  So look there, not
anywhere in the header.

  :0:
  * -1^0 ^To:\/.*
  *  1^1  MATCH ?? @panix[.]com
  *  1^0  ()\/^
  * -1^0 ^Cc:\/.*
  *  1^1  MATCH ?? @panix[.]com
  MYSPAM


The middle condition is a way to clear the match value
in between reusing it with Cc:.  The reason is, if there
isn't a Cc: header, we'll still have the value saved
to MATCH from the To: header (if there was one).  This
gets rid of that.

The recipe is still vulnerable to an instance of you
being mailed like so:

  you(_at_)panix(_dot_)com <you(_at_)panix(_dot_)com>


      What about looking for @panix[.]com([>,]|$) ?


Yes, it's not perfect, but it's a slight improvement.
If we had $NL defined already, we could do it this way:

    @panix[.]com[>),$NL]

Still subject to being fooled if the mail comes in with
whitespace after each address (with or without a comma).
But it's an improvement, I guess.

I also thought of this alternative:

    @panix[.]com[^"'$WS]

I'm not sure which is better.  I think that last
one would need to be ([^"'$WS]|$), actually.  But
I'll let that go for the time being.  ($WS would
have to have been set earlier.)


Meanwhile, with a night's sleep under my belt, I realize
my scoring is flawed.  If there is no Cc:, we have one
too many on the count.  While it's algorithmically cleaner
to break this down into a couple of recipes and avoid
the problem that way, I nevertheless felt challeneged
to make it work right in one recipe.  I do like the
condition that clears the MATCH value!  Anyway, this
one seems right.  It assumes "SMALL" has been set
to some benign value such as 0.000001 beforehand.

   :0:
   *        -1 ^0 ()\/^
   * $ -$SMALL ^0 ^To:\/.*
   * $       1 ^1 MATCH ?? @panix[.]com[^"'$WS]
   *              ()\/^
   * $ -$SMALL ^0 ^Cc:\/.*
   * $       1 ^1 MATCH ?? @panix[.]com[^"'$WS]
   MYSPAM


Here's yet a further improvement that jumps
immediately to the action line as soon as the
second hit happens.  It requires $MAXINT to have
been set to procmail's maximum integer value
(but no higher), per "man procmailsc," of
2147483647.

   :0:
   *         -1 ^0
   * $  $MAXINT ^0 ()\/^
   * $  -$SMALL ^0 ^To:\/.*
   * $        1 ^1 MATCH ?? @panix[.]com[^"'$WS]
   * $  -$SMALL ^0 ()\/^
   * $  -$SMALL ^0 ^Cc:\/.*
   * $        1 ^1 MATCH ?? @panix[.]com[^"'$WS]
   * $ -$MAXINT ^0 ()\/^
   MYSPAM

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail