ietf-asrg
[Top] [All Lists]

Re: [Asrg] greylisting with whitelist of good mailservers

2006-02-10 04:03:56
William Leibzon:


That is why I said it looks like a stochastic process and was not sure if
using mean function is appropriate. It should also be noted that since
I'm using quantified (x.y) scoring data the sample space can be
considered
to be a finite set - I can't yet decide if/how this would help though.


Finite - OK, but I suspect that the scores are only ordinal really. Do we
know how *much* 'worse' a message with a score 10.0 is than one with 5.0?
or do we merely know that one with 7.5 falls inbetween them? is 5:10 the
same as 2.5:5 or 10:15 ? Or what?

On the other hand, we _do_ know that a source with 50/100 messages scoring
over your threshold has twice the rate as one with 25/100. You could have
more than two buckets of course, but two is easy.


Are they? Do you really want a
set of scores like [5.1, 5.0, 5.2, 0.1] to give the same
rep.(arithmetic
mean = 3.85) as the set [3.8, 3.9, 4.0, 3.7] ?

Testing will show if this concept works. But for now, yes I do want to 
them to give the same or similar score, I think with larger sample this 
would give fairly accurate information.


I think it's likely to work moderately well. It's a philosophical thing
really, innit? Are you making repeated measures of the same quantity?
Unless you assume that the mail stream associated with a source is bound to
be homogeneous, I'd say not. If you're considering the probability that a
new message from some source will have a particular quality - then you may
be interested in the number of previous messages that have that quality.

I don't believe that a mean is really appropriate or specially useful.
Incidentally, if you want something to vary with datum age (and sample
size) it should probably be *confidence*, rather than 'reputation'.


Rgds,
JRK


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg