Improving the handling of conversations in Internet mail

Page 1  Palme: Issues when designing filters in message systems

Palme: Issues when designing filters in message systems Page 1




Below are three proposals for improving the handling of
conversations in Internet mail. By conversation is meant a
set of messages directly or indirectly related to each other
by In-Reply-To, References and Obsoletes heading fields.

1. Implicit Message-ID-s

1.1 Why Implicit Message-ID-s

Messages arriving via e-mail sometimes do not have any
Message-ID, since the Message-ID is an optional field in RFC
822. It will then not be possible to include the Message-ID
of the replied-to message in In-Reply-To, References and
Obsoletes clauses in the message header. This stops the
mailing system from providing several useful services to the
user, such as the recognition of duplicates of the same
message and the ability to follow chains of messages replying
to each other. Some systems, because of this, adds Message-
ID-s to such messages in the distribution list expander. An
example of such a system is Listserv.

It would be an advantage, if several independent message
systems would use the same algorithm for producing such
implicit Message-ID-s, so that they would have a reasonably
high propability of producing the same Message-ID for the
same message.

1.2 Suggested algorithm for producing implicit Message-ID-s

This note suggests an algorithm for producing such implicit
Message-ID-s on messages lacking such ID-s in order to refer
to the messages in In-Reply-To:, References: and Obsoletes:
heading fields. I want to thank Eric Thomas for valuable
input in producing this appendix.

The algorithm should give a high probability of producing the
same Message-ID on the same message, when it arrives at
different mailers, possibly through different routes. It
should also give a low probability of producing the same
Message-ID for different messages. (One can however never,
not even with sender-generated Message-ID-s, assume that two
messages are identical even if they have the same Message-
ID.)

The most important problem to overcome is that the same
message may sometimes look different. Typical differences are
heading transformations in gateways between different message
nets, such as between Internet and an X.400-based or DEC-mail
based net. The algorithm described below tries to overcome
these problems. The algorithm might only be suitable for
messages written in languages that use the Latin alphabet.

(a)     Take the value in the From: heading field. Make the
first part of the Message-ID into a checksum of the part
before the first "@", "::" or "%", whichever comes first, in
the From: value. If the checksummed part contains a "!", skip
whatever goes before this character before computing the
checksum.

(b)     Take the value in the Date: heading field and make the
second part of the Message-ID into a checksum of this
datetime.

(c)     Take the value in the Subject: field, convert it to IA5
(=7-bit ASCII) by replacing all non-ASCII characters with
spaces. Note that if the Subject: field is coded in T.61 or
according to RFC 1342, more than one byte in the Subject:
field would represent a single printed character and thus be
replaced by space.

(d) Computer a checksum according to rule (e) for the rest
of the body.

(e)     The algorithm for computing the checksum in (a), (b),
(c) and (d) is to first skip all white space and formatting
characters (Space, Tab, Return, Line Feed, Form Feed, Delete,
other control characters with value less than 32), then
compute an xor on the bits of each character, after
multiplying the character with 2*n where n is the character
position (from the beginning of what is left of the string,
first character has position 1) modulo 20. Then convert the
checksum into a BASE64 string.

(f)     Set the Message-ID to a concatenation of the checksum
according to (a), (b), (c) and (d), using the algorithm (e),
and then concatenate with "@" followed by the special
hostname "V1.checksum". If there is no field to be
checksummed, replace it with a single "?" character.

2. A new heading field Followup-To

2.1 Why the Followup-To field is needed

A person who writes a reply is allowed to send the reply to
any recipients s/he wants to. The "Reply-To:" field is
however a request from the sender, that replies intended for
the originator of the message should be sent to this address
instead to the address in the "From:" heading field.

People sometimes want to send replies not only to the
originator of a message, but to the whole group of people who
read the replied-to message. Normally, this is done by
concatenating the names in the "From", "To", "Cc" and maybe
"Bcc" fields when sending the reply. In this case "Reply-To"
indicates a replacement for "From" in this list, not a
replacement for the whole concatenated list.

Sometimes people want to explicitly say that they do not want
to get replies, even though replies are normally sent to all
who read the replied-to messages. This is useful for example
when a message is sent to more than one mailing list or
newsgroup, and one wants further discussion to be done in
only in one of the lists/newsgroups. This is also useful when
someone is only interested in reading a message for a
particular purpose, and the fortcoming discussion is not on
that particular topic. Thus, there is a need to be able to
indicate a replacement address for those replies which
otherwise would be sent to all recipients of the replied-to
message. Usenet News has a special heading field for this,
the "Followup-To" heading field. I propose that RFC 822 is
extended with this heading fields, since it would be equally
useful in the e-mail as in the Netnews environment.

3. Use of In-Reply-To versus References

Usenet News has a convention that the "In-Reply-To" header
field is used for personal replies, mainly intended for the
author (but which might be sent by mail to some other
recipients in some cases), while the "References" header
field is used for group replies intended for all participants
in a discussion. Such a convention is valuable also in e-
mail, since it allows the recipient to filter differently on
these two categories of replies.

I thus suggest that RFC 822 is extended with such a
recommendation.