Re: Multi-message reply and "References:"


I've been quietly consulting with my colleagues here at Lotus, and what
I've heard back pretty much matches my intuition, which is that we're
better off with a new header that spells out the thread map
unambiguously.  Anyway, herewith the comments from Lotus' thread
specialist.  -- Nathaniel

From: Steven Rohall <steven_rohall(_at_)us(_dot_)ibm(_dot_)com>
Subject: Re: Multi-message reply and "References:"

Nathaniel,

Thanks for forwarding this thread (no pun intended) of messages to me.
Basically, I have to agree with Dave.

Although I think that Pete's original proposal could be made to work, I
feel that the addition of semantics on the References header and the
goal of remaining backward compatible are at odds.  As Dave points out,
References would need to be augmented with duplicate message IDs or
special "token" message IDs (e.g., left-p(_at_)ren and right-p(_at_)ren).  This 
is
ugly and error-prone.  And, if a separate "ThreadsHead" header is
necessary to make it work, "smart" UAs now need to correlate data from
two headers to compute the threads.  If a separate header is necessary,
why not go all the way and define a Threads header that really does what
is needed:  represents DAGs in a clear an concise manner.

As one who has been involved in thread "calculation" for some time, I
think the biggest advantage of such a scheme is that we can take the
guesswork out of threads.  In particular, if we define a Threads header
that contains all of the lineage information known at message
composition, then UAs don't need to calculate threads in some arbitrary
fashion at message reception; instead, the threads are precisely known.
If a user composes a "new" message, then that message is the root of a
new thread tree and the Threads header simply contains that message's
message-id.  Replies to that message simply extend the Threads header
with their own information.  Indeed the threads are known despite a user
having deleted messages (or not having saved sent messages)--deleted
messages are a real problem for post hoc thread computation and force
UAs to rely on subject-based matching, for example.  (Subject-based
matching in a UA may be very useful, but it is not the same thing as a
thread.)  Subsequent replies simply augment the Threads header without
regard to which messages may or may not still be in the user's mail
database.

The only downside I can see to this proposal (and indeed, it is
potentially a problem with Pete's original proposal) is that the length
of the Threads header (or the References header in Pete's case) can get
long.  Is there going to be a problem with MTAs truncating the header?
Maybe, but I think I'd take a wait and see attitude.  A recent study of
ours (of ~42000 messages across 8 users) shows that 8010021331513f email
threads (old-fashioned ones, not Pete's fancy ones) are 5 or fewer
nodes.  The 97 0x4046acfcoint was at 16 nodes.
                               -Steve.

Steven L. Rohall
Software Architect, IBM Research
One Rogers Street, 5120S, Cambridge, MA 02142
phone: (617)693-1840, fax: (617)693-5551
steven_rohall(_at_)us(_dot_)ibm(_dot_)com