Re: New Internet Draft: draft-duerst-archived-at-00.txt


On Tue October 26 2004 18:05, Keith Moore wrote:

Or a least, if a message is going to point to an 
archive of itself, it should be a faithful archive.


That raises several new issues with Martin's draft.   They can be
summarized as "what is the utility of the proposed field; who
will use it, under what circumstances, and for what purpose(s)?".
Note that that is not a question about syntax, or about semantics,
or about validity, but about utility. 

I also discuss a security issue towards the end of this message.

Some details about the utility issues:

1. Since the proposed field is contained within a message header
   and purportedly points to an archive of that message, it seems
   of little utility except in highly unusual circumstances (see below).
   If I already have the entire message at my disposal, why should
   I want a field that tells me where I can find a copy of that same
   message?   I already have the message, so there's little point in
   parsing it to find that field, parsing the field to get a URI, parsing
   that URI to get a scheme/authority/path/query/fragment, looking
   up the scheme to find a suitable protocol, locating the server,
   making a connection --  if not offline -- etc., only to eventually
   retrieve a copy of what I already have.  And if I don't have the
   message, I probably don't have that field.  In short, if I have the
   message, the field is superfluous, and if I don't have the message,
   I cannot make use of the field to get the message.  That is in
   contrast to something like message/external-body, where some
   content (which could well be an entire message or even a digest
   of messages) can be accessed via some specified mechanism(s)
   without having to already have the desired content at hand;
   i.e. one message contains information for obtaining a
   different message.

2. There is only one circumstance that I can think of where one
    might have the proposed field but not the entire message. And
    that depends for proper operation on specification of issues
    which are not currently specified in the draft under
   discussion.  The situation would be a message/partial fragment
   numbered 1 containing the encapsulated message header in the
   absence of one or more fragments (RFC 2046). Unlike RFCs
   2298 and 3798, the subject draft does not amend RFC 2046's
   rules regarding which header fields are reconstituted from the
   encapsulated message header vs. from the header of the first
   fragment message which encapsulates it.  Lacking such an
   amendment, RFC 2046's rules require that the field be taken
   from the enclosing message, which I believe is undesirable.
   Consider that the entire original message and that first
   fragment --  which are different messages -- might both be
   archived (probably at different places).  In that case, the
   message header of the first fragment will contain (at least)
   two of the proposed fields, one referring to an archive of the
   entire message and the other referring to an archive of just
   the first fragment message -- with no way to tell which is
   which.  And the resulting "reassembled" message will also
   contain those fields, one of which points to an archive of the
   complete message, while the other points to a completely
   different message (the first fragment).  And I would add that
   fragmentation/reassembly is not perfectly transparent, so
   neither archive will be a faithful copy of the "reassembled"
   message.  The alternative, which would have to be provided
   for by an amendment to RFC 2046 rules similar to that in
   RFCs 2298/3798, is that the proposed field would be obtained
   from the encapsulated message's header.  In either case (the
   unamended RFC 2046 rules which are problematic for the
   proposed field, or with an amendment), one would have the
   proposed field if and only if one had the first fragment message;
   it would be unavailable if that fragment was missing.  The
   combination of circumstances in which the proposed field
   might have some use is:
   a) the message would have to have had the field added
   b) the message would have to have been fragmented after
       the field was added
   c) one or more of the fragments, but not the first fragment,
       would have to be missing
   d) the purported archive of the complete message would
       have to in fact exist and be accessible
   e) for full utility, the archive would have to be in RFC [2]822
       format (see Keith's remarks)
   f) none of the differences which might be expected to exist
       between the archive and the reconstituted message (had
       all fragments been available) could be significant (Received
       or other trace fields, incidental details like time of arrival,
       etc.).
   g,h) fragmentation and reassembly should both have been
      performed in accordance with an amended set of rules;
      otherwise one is likely to retrieve a copy of an already-
      available fragment.
   That is rather a large number of necessary conditions, several
   of which are likely to be rare occurrences.

Issues of syntax, semantics, and validity aside, I'm not convinced
that the proposed field is useful except in a tiny fraction of cases
that result from a combination of unusual circumstances.  If a
message is so large that fragmentation would occur, putting the
message on a server and sending retrieval instructions would be
preferable in most instances, and there are existing mechanisms
to handle that scenario (simply sending a URI as text, or using
message/external-body).  Providing a pointer to a message from
some media other than that message is not unreasonable, and
there are a number of standardized mechanisms for that. However,
either something is fundamentally wrong with the concept of
putting a pointer to a message within the referenced message or
the author needs to provide some explanation in the draft as to
how and why he deems that to be a useful capability.

Self-reference can lead to security issues. Consider:

   From: foo(_at_)bar(_dot_)example(_dot_)com
   To: list(_at_)example(_dot_)net
   Date: 27 Oct 2004 08:09:10 -0500
   Subject: don't try this at home
   Archived-At:<http://bogus.server.example.org/123xyz>
   Content-Type: message/external-body ; access-type=url ;
      name="http://bogus.server.example.org/123xyz";

  Content-Type: message/rfc822
  Content-ID: <123xyz(_at_)bogus(_dot_)server(_dot_)example(_dot_)org>

  This is not really the body.

Note that this is a message/external-body message, and that
the URI in the proposed Archived-At field points to the same
location as the external body reference.  Such a message
constitutes a two-pronged denial of service attack by the
sender against the recipient and against the specified server.
Such an attack is of course possible w/o the Archived-At field;
Archived-At provides an additional attack vector.  The draft
does not specify whether automated use of the Archived-At
field to retrieve a message is prohibited or encouraged. If it
is not prohibited, network damage (viz. the denial of service
scenario specified here and the resulting network congestion)
can occur.  It is not difficult to see how such an attack can
be extended to a multi-pronged attack on multiple servers
(construct a loop of messages with external references);
the Archived-At fields need not refer to the same URI as the
external-body reference in the same message, and the number
of attack vectors is increased with each additional message
(and Archived-At field) involved.  Note that MIME has provision
for caching and requires a Content-ID field for use with
caching; The draft under discussion does not mention caching
at all and makes no provision for an identifier that might be
used to facilitate caching mechanisms.  Note that
message/external-body is not necessary for the security issue
mentioned to arise; any automatic external reference
mechanism with no provision for caching (or where
caching can be disabled or is ineffective) is vulnerable.
Text/html, message/http, application/pdf are examples of
other media types that have provision for external references.
Archived-At is inherently an external reference and
self-referential, which opens the door to this security issue
in the absence of any specific media type vulnerabilities.
Those media types are mentioned here to illustrate the
mechanisms, to show how the proposed field exacerbates
existing problems, and to facilitate discussion of mechanisms
provided to afford some degree of protection against the
vulnerabilities by properly-implemented MIME mechanisms --
mechanisms for which there is no corresponding provision
in the draft under discussion.

The bottom line is that if there is no utility for the proposed
field, there's not much point in pursuing the draft further.
Conversely if there is some utility, that needs to be clearly
described, and the next draft version should address
interaction with message/partial and should also more fully
address security issues.