ietf-asrg
[Top] [All Lists]

Re: [Asrg] Summary of junk button discussion

2010-02-27 05:49:36
On 27/Feb/10 07:22, Chris Lewis wrote:
On 2/26/2010 7:22 AM, Alessandro Vesely wrote:
[...]
Heck, SpamAssassin even manages to tune Bayesian without having any end-user 
feedback at all.

I never adventured into such esoteric settings. Are there howtos or any docs 
about it?

I think it's called "Autolearn". I think it works by treating SA scores > <threshold> as "spam", 
and scores < <possibly a different threshold> as "ham", and tunes Bayesian from that. IOW: the existing SA 
rules refine Bayesian, and in the long term this allows Bayesian to cross-correlate across individual emails, and Bayesian 
score stuff that the SA rules don't necessarily even see.

I've found some explanation in http://spamassassinbook.packtpub.com/chapter9_preview.htm . My understanding is that auto-learn works against that gray area of uncertain cases. The book's author recommends not to rely on that feature alone. He notes that "Once a false positive occurs the Bayesian database will begin to lose effectiveness, and future Bayesian results will be compromised."

If we consider machine-learning for what it is, we must agree that current technology does not allow 'puters to understand human speech better that we do. Even though SA is able to classify a bunch of messages much better that an unmotivated human, it does not really /understand/ their contents. Therefore, it has to be trained by a (motivated) human, which implies interaction with users.

[...] in order to attach to junk buttons a meaning of "filter messages /like/ 
this" we would need to define what that means in rather unambiguous terms.

No, you don't. That's up to the implementer of the report handler what it does.

I agree that it's up to abuse report consumers to state what they do.

A discussion about "generalized FBLs" will probably involve concerns about who is entitled to consume ARs, and may also consider whether consuming ARs implies any duty. Letting users know about any outcome, or letting them further modify any such outcome, are examples of possible duties. The less of them, the better.

For example, consider the last MX sending an AR to the 1st MX, as in the picture in http://wiki.asrg.sp.am/wiki/Abuse_Reporting . If the 1st MX feeds the reported message to its Bayesian engine, then it should also allow forwarded users to check uncertain messages and correct any false positives. If it does not grant such interactivity, users may want to omit forwarding to it, in order to avoid losing mail.

Would it make sense to require that interactive filtering activity is limited to the last MX? Bandwidth-wise, it is counter-intuitive. It may be better to avoid this argument entirely.

Just as it is with Bayes.

Why are you treating this any different than spam/ham training in Bayes? It's 
no different.

Potentially, it is orders of magnitude better than Bayes.

I'd lean toward specifying just how to deliver abuse reports. Neither junk 
buttons nor their color should be mandated.

Who is trying to specify buttons or their color?

I hope we won't. But it seems difficult to specify MUA's AR addresses without creating false expectations.

I'm aiming for a specification that permits a single <user action> to 
communicate upstream for _both_ filtering and reporting purposes, where whether it's 
used for filtering or reporting or both in any given instance is up to the site admin 
and/or end-user.

+1, and I would welcome an efficient IMAP implementation in that sense. However, the spec should also allow to just send complaints.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg