EXTERNAL-BODY: status report

There's been a lot of discussion about message/external-body.  I'd like
to try to respond to a bunch of it and propose how to proceed.

First, Marshall Rose correctly points out that some additional
information is needed for access-type ftp.  I agree and am happy to
accomodate this.

Marshall also claims that for access type NFS, mounting information is
needed.  This is probably true, but a real mess.  I would propose to
punt on NFS for now, in favor of a file type that will be NFS-accessable
within a local site.  People rarely mount NFS over a long distance
anyway, and with AFS, which is more often used for such purposes, no
mounting information is needed.  So I think the NFS type should go away.

Several people have proposed moving external-body from message to
application.  I resist this, because I like the neat way that type
"message" gives you of indicating the content-type information for the
referenced objects.  To me, it's just like "message/partial" in that
regard.

Excerpts from TODO.rfc: 18-Dec-91 Re: Handling external refer.. Dave
Crocker(_at_)mordor(_dot_)stan (845)

'partial' seems quite different from 'external' since partial really does
refer to a message, albeit one that has been chopped apart, whereas
external refers to an object of unknown, and varied, type.


But given the "transparent" semantics we've defined for message/partial,
these really ARE the same cases.  If you've got a message that's too big
for normal SMTP delivery, you can either deliver it by breaking it up or
by delivering an external reference.  They're two paths to the same
thing, really.  In each case, embedding it in a message is probably NOT
what you want to do, but is something that our system can most easily
accomplish by embedding it in a message.  Actually, embedding it in a
multipart, e.g. "multipart/partial" or "multipart/external" is even
weirder!  I especially like the notion of "multipart/partial"!  :-)

Several people have offered variants on the theme of replacing 

Content-type: message/external-body; access-type = ftp

with

Content-type: message/ftp

or

Content-type: message/external-body-ftp

or

Content-type: message/external-body
Access-type: ftp

or something like that.  I also resist this suggestion, because (in all
but the last case) I've really come to like the way a MIME
implementation can recognize, say, that data is an IMAGE without being
able to handle the specific image format included.  In other words, I
like having general type information even for unrecognized subtypes. 
Thus I'd like to be able to say

"This message references external data, but I don't understand how to
obtain the external data using 'HORSEFEATHERS'."

rather than

"Unrecognized content-type:  message/external-body-horsefeathers"

I can give the latter message more confidentally with

Content-type: message/external-body; access-type=horsefeathers

Of course, this also works with Dave Crocker's "Access-type" proposal,
but I'm REALLY reluctant to add a new header field at this stage of the
game...

Vincent Lau suggests adding a "permission" attribute to the
external-body type.  I have no problem with this, provided that the
default is read-only as he suggests.  I'm happy to add this to the spec.
----------------------------------------------------------------
Having said all of that, it turns out that if you buy the above
arguments, the prose doesn't have to change all that much.  Here's a new
version, for you to take pot-shots on if you are so inclined.  Most
crucial:  is this new version a show-stopper for anybody?  -- Nathaniel

7.3.3   The Message/ExternalBody subtype

The ExternalBody subtype indicates that the body or body part is not
included, but merely referenced.  In this case, the parameters describe
a mechanism for accessing the external binary data.   The set of
possible attributes includes, but is not limited to:

    NAME -- The name of a file or other token that can be used to
    reference the external body data.

    SITE -- a domain specifier for a machine or set of machines that
    are known to have access to the data file.  Asterisks may be
    used for wildcard matching to a part of a domain name, such as
    "*.bellcore.com", to indicate a set of machines on which the
    data should be directly visible, while a single asterisk may be
    used to indicate a file that is expected to be universally
    available, e.g., via a global file system.

    ACCESS-TYPE -- one or more words, comma-separated, indicating
    supported access mechanisms by which the file or data may be
    obtained.  Values include, but are not limited to, "FTP",
    "ANON-FTP", "TFTP", and "AFS".  (The value "ANON-FTP" may be
    used to specify the FTP protocol with login "anonymous".)

    EXPIRATION -- The date (with the RFC 822 "date-time" syntax)
    after which the existence of the external data is not guaranteed.

    DIRECTORY -- A directory from which the data named by NAME
    should be retrieved.  This is particularly useful for the FTP
    access-type.

    MODE -- A transfer mode for retrieving the information, with
    access-type FTP.

    PERMISSION -- A field that indicates whether or not it is
    expected that clients might also attempt to overwrite the data. 
    By default, or if permission is "read", the assumption is that
    they are not, and that if the data is retrieved once, it is
    never needed again.  If PERMISSION is "read-write", this
    assumption is invalid, and any local copy should be considered
    no more than a cache.

With the emerging possibility of very wide-area file systems, it becomes
very hard to know in advance the set of machines where a file will and
will not be accessible directly from the file system.  Therefore it may
make sense to provide both a file name, to be tried directly, and the
name of one or more sites from which the file is known to be accessible.
 An implementation can try to retrieve remote files using FTP or any
other protocol, using anonymous file retrieval or prompting the user for
the necessary name and password.  If an external body is accessible via
multiple mechanisms, the sender may include multiple parts of type
message/externalbody within a part of type multipart/alternative.

However, the externalbody mechanism is not intended to be limited to
file retrieval.  One can imagine, for example, using a LISTSERV
mechanism, or using unique identifiers and a video server for external
references to video clips.  However, this memo explicitly defines only
the NAME, SITE, and ACCESS-TYPE attributes for retrieval purposes. 
Other attributes may be defined as needed.

If a message is of type "message/externalbody", then the body of the
message will contain only the header fields of the encapsulated message.
 The body itself is to be found in the external location.  This means
that if the body of the "message/externalbody" message contains two
consecutive CRLFs, everything after those pairs must be ignored.

The embedded message header fields which appear in the body of the
message/externalbody data can be used to declare the Content-type of the
external body.  Thus a complete message/externalbody message, referring
to an image in G3FAX format, might look like this:

    From: Whomever
    Subject: whatever
    Content-Type: multipart/alternative; boundary=42


    --42
    Content-Type: message/externalBody; 
        name="BodyFormats.ps"; 
        site="thumper.bellcore.com"; 
        access-type = ANON-FTP;
        directory = "pub";
        mode = "image";
        expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

    Content-type: application/postscript

    --42
    Content-Type: message/externalBody; 
        name="/u/nsb/writing/rfcs/RFC-XXXX.ps"; 
        site="thumper.bellcore.com"; 
        access-type = AFS
        expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

    Content-type: application/postscript

    --42--

Like the message/partial type, the message/externalbody type is intended
to be transparent, that is, to convey the data type in the external body
rather than to convey an message with a body of that type.  Thus the
headers on the outer and inner parts should be merged using the same
rules as for message/partial.  In particular, this means that the
Content-type header is overridden, but the From and Subject headers are
preserved.

Note that since the external bodies are not transported as mail, they
need not conform to the 7-bit and line length requirements, but might in
fact be binary files.  Thus a Content-Transfer-Encoding is not generally
necessary, though it is permitted.