Re: [Asrg] Summary of junk button discussion

On 27/Feb/10 07:22, Chris Lewis wrote:

On 2/26/2010 7:22 AM, Alessandro Vesely wrote:

[...]

Heck, SpamAssassin even manages to tune Bayesian without having any end-user 
feedback at all.


I never adventured into such esoteric settings. Are there howtos or any docs 
about it?


I think it's called "Autolearn". I think it works by treating SA scores > <threshold> as "spam", 
and scores < <possibly a different threshold> as "ham", and tunes Bayesian from that. IOW: the existing SA 
rules refine Bayesian, and in the long term this allows Bayesian to cross-correlate across individual emails, and Bayesian 
score stuff that the SA rules don't necessarily even see.

I've found some explanation inhttp://spamassassinbook.packtpub.com/chapter9_preview.htm . Myunderstanding is that auto-learn works against that gray area ofuncertain cases. The book's author recommends not to rely on thatfeature alone. He notes that "Once a false positive occurs theBayesian database will begin to lose effectiveness, and futureBayesian results will be compromised."

If we consider machine-learning for what it is, we must agree thatcurrent technology does not allow 'puters to understand human speechbetter that we do. Even though SA is able to classify a bunch ofmessages much better that an unmotivated human, it does not really/understand/ their contents. Therefore, it has to be trained by a(motivated) human, which implies interaction with users.

[...] in order to attach to junk buttons a meaning of "filter messages /like/ 
this" we would need to define what that means in rather unambiguous terms.


No, you don't. That's up to the implementer of the report handler what it does.


I agree that it's up to abuse report consumers to state what they do.

A discussion about "generalized FBLs" will probably involve concernsabout who is entitled to consume ARs, and may also consider whetherconsuming ARs implies any duty. Letting users know about any outcome,or letting them further modify any such outcome, are examples ofpossible duties. The less of them, the better.

For example, consider the last MX sending an AR to the 1st MX, as inthe picture in http://wiki.asrg.sp.am/wiki/Abuse_Reporting . If the1st MX feeds the reported message to its Bayesian engine, then itshould also allow forwarded users to check uncertain messages andcorrect any false positives. If it does not grant such interactivity,users may want to omit forwarding to it, in order to avoid losing mail.

Would it make sense to require that interactive filtering activity islimited to the last MX? Bandwidth-wise, it is counter-intuitive. Itmay be better to avoid this argument entirely.

Just as it is with Bayes.

Why are you treating this any different than spam/ham training in Bayes? It's 
no different.


Potentially, it is orders of magnitude better than Bayes.

I'd lean toward specifying just how to deliver abuse reports. Neither junk 
buttons nor their color should be mandated.


Who is trying to specify buttons or their color?

I hope we won't. But it seems difficult to specify MUA's AR addresseswithout creating false expectations.

I'm aiming for a specification that permits a single <user action> to 
communicate upstream for _both_ filtering and reporting purposes, where whether it's 
used for filtering or reporting or both in any given instance is up to the site admin 
and/or end-user.

+1, and I would welcome an efficient IMAP implementation in thatsense. However, the spec should also allow to just send complaints.

_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg