[Top] [All Lists]

Re: Multi-message reply and "References:"

2003-06-07 09:34:33

When you reply to a message, you put that message's message-id in the In-Reply-To header. You copy any References header from that message into the new message, and append the original message-id to it. So if I have messages 1 (the original), 2 (a reply to 1) and 3 (a reply to 2), the headers look like:

    Message-Id: 1

    Message-Id: 2
    In-Reply-To: 1
    References: 1

    Message-Id: 3
    In-Reply-To: 2
    References: 1 2

Note that a References header field may appear in contexts other
than a reply: RFC 2046 section specifically permits use
of References with message/partial media types representing
fragments of an original message.  So a large message with id X
may be split for transport into fragments:

Message-ID: 1
Body contains header of original message, including the Message-ID: X
and any References fields (N.B. these are in the *body* of message 1).

Message-ID: 2
References: 1

Message-ID: A
Body contains fields of fragment 3, which has been further split,
including (in the *body*) Message-ID: 3 and References: 2.

Message-ID: B
References: A

Message-ID: 4
References: 3

Note that if transport does not reassemble the message from fragments,
a UA may need to examine body content of messages A and 1 in order to
resolve references.  Also note that some message Y may have a field
References: W X
and that will again require the UA to examine body content (of message 1)
in order to find X in the absence of reassembly by the transport layer.

Reassembly can only reliably take place at the final stage of transport
(since fragments may take different routes, not all fragments are available
at intermediate hosts), and as noted in RFC 2821, it is not always possible
for an MTA to determine whether or not it is making final delivery. So
UAs need to cope with fragments.

Now let's suppose we have message 4 as a reply to 1, and 5 as a reply to 4. Those messages look like:

    Message-Id: 4
    In-Reply-To: 1
    References: 1

    Message-Id: 5
    In-Reply-To: 4
    References: 1 4

While RFCs 1036 and 2822 suggest ordering of the msg-ids in the References
field, there is no requirement to use a specific ordering.  In particular,
UAs should not presume that the order of appearance of msg-ids in a References
field is significant.  So in the example above,

Message-ID: 5
In-Reply-To: 4
References: 4 1

is legal. A UA wishing to construct an ordering of messages 1, 4, and 5
should examine the header fields of messages 1 and 4 as well as 5, and
it doesn't matter what order it does so.

Now, through the magic of the content concentrator and multiple reply, we reply to 3 and 5 simultaneously. The resulting message should be constructed as follows:

    Message-Id: 6
    In-Reply-To: 3 5
    References: 1 2 3 1 4 5

In this case (and subsequent examples), repeating a msg-id is redundant;
ordering of the msg-ids in the References fields of messages 3 and/or 5
might not be as is assumed, and a UA will have to examine all of messages
1 through 6 to properly construct a partial ordering -- and there's no reason
to examine message 1's fields twice.  Repetition makes the field longer
than necessary (and it is known that some implementations will drop some
msg-ids in long References fields).  As noted by Adam Costello, 2822
discourages use of References when replying to multiple messages, and so
long as each message has its predecessor(s) listed in its In-Reply-To
field, a UA can construct a partial ordering by recursively examining
each message listed in an In-Reply-To field; the References field itself
may be redundant[*].

First complication: replying to a combined thread. Suppose message 8 is a reply to 7. We think its References header should reflect all the threads it continues, and therefore should be constructed thus:

    Message-Id: 8
    In-Reply-To: 7
    References: 1 2 3 7 1 4 5 7

You know that the threads have been combined because 1 appears twice, and you can figure out the sequencing.

That presumes that every generator of References fields has used the same
algorithm and that no generator or transport has dropped any msg-ids. It
is known that ids are dropped by some implementations, so the implicit
presumptions are not valid.

Second complication: replying to combined DISTINCT threads.

Suppose we have message A, B as a reply to A, and C as a reply to B. We now reply to 3 and C, to form message X. The references header would be generated thus, according to the above algorithm:

    Message-Id: X
    In-Reply-To: 3 C
    References: 1 2 3 A B C

The problem is that there is no way to tell that A is not a reply to 3.

Of course there is: message A does not refer to message 3 in either of its
In-Reply-To or References fields.

Here is where we will have to invent syntax. Either we make a special "new thread" message-id and insert it:

    References: 1 2 3 iamspecial A B C

That would be a serious problem. It changes the semantics, and current
UAs wouldn't use the changed semantics. It also won't work if (as is known
to be the case) msg-ids are dropped and/or if some generator uses a
different means of ordering msg-ids or a different "marker" id,

Or we add a new header:

    Thread-Heads: 1 A
    References: 1 2 3 A B C

Either option would preserve the information that 1 2 3 and A B C are distinct threads.


In-Reply-To: 3 C

suffices (the UA of course needs to examine the headers of messages 3 and C
to find 2 and B, then 1 and A; and an implementation could cache the
relationship information, so that need not be expensive) if the intermediate
messages are available. If a References field is also supplied, a UA can
still produce a partial ordering even if some messages are unavailable.

A more complicated scenario can't be readily handled by such a scheme alone,
even with the unrealistically optimistic assumptions that all generators use
the same ordering algorithm (and marker id) and that no generator or transport
ever drops any msg-id. A UA may still need to examine intermediate message
headers in order to determine the message relationships: consider original
messages 1, A, 10, and AA

Message-ID  2
In-Reply-To: 1
References: 1

Message-ID: B
In-Reply-To: A
References: A

Message-ID: 11
In-Reply-To: 10
References: 10

Message-ID: BB
In-Reply-To: AA
References: AA

Message-ID: Q
In-Reply-To: 11 BB
References: 10 11 AA BB

One might eventually get to:

Message-ID: R
In-Reply-To: P Q

which could apply equally well to (at least 2) different relationships:


Message-ID: X
In-Reply-To: 2 B

Message-ID: P
In-Reply-To: X 10

or, 2.

Message-ID: X
In-Reply-To: B

Message-ID: P
In-Reply-To: 2 X 10

If I understand the algorithm correctly, either case will result in the
same References field (and Thread-Heads if used) for message R.

Determining whether or not message X is a response to message 2 cannot be
determined by looking at the header of message R alone; it is necessary to
examine the header of at least one other message (the obvious preference
being message X if it's available)..


A UA which is intended to thread messages by following References and
In-Reply-To header fields will have to do several things:

1. It needs to determine whether a given References field is part of a
   reply thread or part of a series of partial message fragments. If
   the message has a Content-Type field specifying a media type of
   message/partial, it is the latter.

2. It needs to (at least virtually) reassemble fragmented messages in
   order to find the message header (which is encapsulated in the first
   part of the message/partial fragment series), which in turn is necessary
   in order to follow Message-ID, In-Reply-To, and References fields. It's
   also necessary in order to display the correct information for the
   original message (as opposed to uninteresting information on a series
   of fragments).

3. It appears to be necessary to examine the Message-ID, In-Reply-To, and
   References fields of each message (including reassembled fragmented
   ones) in order to construct a relationship graph of the messages. At
   least it is not always possible to unambiguously determine the
   relationships using the proposed methods (one could construct a graph
   if the References field contained all of the embedded relationships,
   e.g. if it consisted of ordered pairs of related msg-ids, which would
   lead to quite long fields).  Examiming each message header is not a big
   deal; it is obviously necessary in any event for the UA to be able to
   display subject, date, sender, etc. for each message.

4. The message relationships will yield a partial ordering; the UA might
   use additional information to produce a repeatable full ordering for

* unfortunately, there is some software that so badly botches In-Reply-To
fields that they become unparseable. E.g.:

X-Mailer: Mutt 1.0.1i
In-Reply-To: <20010715020400(_dot_)374B8131(_at_)proven(_dot_)weird(_dot_)com>; 
from woods(_at_)weird(_dot_)com on Sat, Jul 14, 2001 at 10:04:00PM -0400

is a relatively mild example. Even allowing phrases in In-Reply-To,
unquoted semicolon, @, commas, and colons aren't valid in a phrase. In
this case, the References field carries redundant information which
provides backup for the botched In-Reply-To field. References also
accounts for cases where some intermediate messages might not be
available to the UA; however repetition of msg-ids is still unnecessary
and because of dropped ids and ordering variation a UA will still need
to examine the available messages to correctly construct an ordering.