ietf-asrg
[Top] [All Lists]

Re: [Asrg] greylisting with whitelist of good mailservers

2006-02-09 01:22:42

Thanks for the link Eric. It does look interesting and I'll take a look
at that code... My greylisting is actually done as sendmail milter and
not directly part of spamassasin. However my reputation system (I've already finished the initial code as spamsassin plugin) is - they will interface together by means of database. Spamassasin plugin code will be made public by end of the month or sooner since I'll need some additional testers to help with this research.

I do actually have quite a bit larger plans for all this and reputation database of single ips is just first step in it. One of the things I ran into is trying to decide what algorithm to use for calculating mean value in real time. The obvious mean functions are arithmetic and harmonic mean and there is also geometric mean, but its arithmetical complexity is not likely to be worth it. The algorithms I thus intend to try are:

          MA(n)*n + S(n+1)
 MA(n+1)= ----------------   Where MA(x) is arithmetic mean at step x
                n+1          with S(x) being actual score at step x

                 n+1             Where MH(x) is harmonic mean at step x
 MH(n+1)= ------------------     with S(x) actual score at step x
          n/MH(n) + 1/S(n+1)

I also think that some kind of multiplier should be applied so that latest
data has highier weight then previous one, so this endup being:

            MA(n,w)*n + S(n+1)*w    Where MA(x,w) is arithmetic mean
 MA(n+1,w)= ------=-------------    at step x with weight w (w>1)
                   n + w

                   n + w            Where MH(x,w) is harmonic mean at
 MH(n+1,w)= --------------------    step x with weight w (w>1)
            n/MH(n,w) + w/S(n+1)


I'm actually not good with probability theories so if other can comment if I'm on the right track, it'd be good. Also the spam score function S(x) to me feel like a stochastic process, so something like bayes score calculations should probably be used. In any case if people here can point me to good book to read about real-time calculations & operations with stochastic functions, I'd appreciate it.

---
William Leibzon
  mailto: william(_at_)completewhois(_dot_)com
Anti-Spam and Email Security Research Worksite:
  http://www.elan.net/~william/emailsecurity/
Whois & DNS Network Investigation Tools:
  http://www.completewhois.com

On Thu, 9 Feb 2006, Eric A. Hall wrote:

On 1/29/2006 10:27 PM, William Leibzon wrote:

What I'm thinking for one of the next stages is change this whitelisting
ip database into (IP,score) where score is updated and is medium of the
scores of the emails that came from the system before - i.e. it basicly
is a real-time updated reputation system.

I did some work on something like this a while back by overloading the
SpamAssassin auto-whitelist database--tuples and reputation information is
already stored there, and I pass most incoming mail through SA while the
session is still active, so I get to reuse some of that info for free.
http://www.ehsco.com/misc/sagrey/ is where the SA plugin lives if you want
to look at it.

Right now I'm only using it to add an extra score for mail that appears to
be spam and originated from an unknown tuple, but what I want to do is
defer acceptance based on whether or not the rule fired (essentially
allowing me to restrict greylisting to mail that is likely spam from
unknown tuples). I couldn't do that with Postfix last time I looked
(header checks could not generate a DEFER action) and I haven't had time
to rebuild my whole mail system yet.

As to what you are pursuing, a similar approach would let you leverage the
reputation score associated with the tuple in the AWL, which seems to be
mostly what you are looking for.

--
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg