[Top] [All Lists]

Re: Attempts at establishing harmful conventions

2004-11-30 19:59:38

On Nov 30 2004, Steve Dorner wrote:

At 7:10 PM -0500 11/30/04, Keith Moore wrote:
users are lazy - they will sometimes reply to a message just as an 
easy way to get the same recipient list, even if the topics are 
different.  often they don't even change the subject.

Barring Bayesian analysis, I think we're SOL if they don't change the subject.

But if they do change the subject, and one slavishly follows IRT 
anyway, how is that A Good Thing?

This is somewhat complex. Bayesian analysis has the most clues to work
with if it's allowed to examine the full message body. 

Headers tend to be too similar across threads, so detecting a new
thread simply on the basis of header statistics would probably have
low success rate (contrast with spam filtering, where the similarity
of headers actually helps to distinguish interesting messages from
uninteresting ones).

Unfortunately, retrieving/analysing full bodies are expensive
operations, for example think of an IMAP server where the MUA works
with headers only and downloads bodies only if absolutely necessary.

So Bayesian analysis for the purpose of thread idetification is not
something that can be plugged in easily, it needs a support framework
(ie inexpensive availability of the full message, caching of computed
scores etc).