On 27/Feb/10 07:22, Chris Lewis wrote:
On 2/26/2010 7:22 AM, Alessandro Vesely wrote:
[...]
Heck, SpamAssassin even manages to tune Bayesian without having any end-user
feedback at all.
I never adventured into such esoteric settings. Are there howtos or any docs
about it?
I think it's called "Autolearn". I think it works by treating SA scores > <threshold> as "spam",
and scores < <possibly a different threshold> as "ham", and tunes Bayesian from that. IOW: the existing SA
rules refine Bayesian, and in the long term this allows Bayesian to cross-correlate across individual emails, and Bayesian
score stuff that the SA rules don't necessarily even see.
I've found some explanation in
http://spamassassinbook.packtpub.com/chapter9_preview.htm . My
understanding is that auto-learn works against that gray area of
uncertain cases. The book's author recommends not to rely on that
feature alone. He notes that "Once a false positive occurs the
Bayesian database will begin to lose effectiveness, and future
Bayesian results will be compromised."
If we consider machine-learning for what it is, we must agree that
current technology does not allow 'puters to understand human speech
better that we do. Even though SA is able to classify a bunch of
messages much better that an unmotivated human, it does not really
/understand/ their contents. Therefore, it has to be trained by a
(motivated) human, which implies interaction with users.
[...] in order to attach to junk buttons a meaning of "filter messages /like/
this" we would need to define what that means in rather unambiguous terms.
No, you don't. That's up to the implementer of the report handler what it does.
I agree that it's up to abuse report consumers to state what they do.
A discussion about "generalized FBLs" will probably involve concerns
about who is entitled to consume ARs, and may also consider whether
consuming ARs implies any duty. Letting users know about any outcome,
or letting them further modify any such outcome, are examples of
possible duties. The less of them, the better.
For example, consider the last MX sending an AR to the 1st MX, as in
the picture in http://wiki.asrg.sp.am/wiki/Abuse_Reporting . If the
1st MX feeds the reported message to its Bayesian engine, then it
should also allow forwarded users to check uncertain messages and
correct any false positives. If it does not grant such interactivity,
users may want to omit forwarding to it, in order to avoid losing mail.
Would it make sense to require that interactive filtering activity is
limited to the last MX? Bandwidth-wise, it is counter-intuitive. It
may be better to avoid this argument entirely.
Just as it is with Bayes.
Why are you treating this any different than spam/ham training in Bayes? It's
no different.
Potentially, it is orders of magnitude better than Bayes.
I'd lean toward specifying just how to deliver abuse reports. Neither junk
buttons nor their color should be mandated.
Who is trying to specify buttons or their color?
I hope we won't. But it seems difficult to specify MUA's AR addresses
without creating false expectations.
I'm aiming for a specification that permits a single <user action> to
communicate upstream for _both_ filtering and reporting purposes, where whether it's
used for filtering or reporting or both in any given instance is up to the site admin
and/or end-user.
+1, and I would welcome an efficient IMAP implementation in that
sense. However, the spec should also allow to just send complaints.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg