ietf-822
[Top] [All Lists]

Re: The TEXT/HTML Content Type in e-mail

1995-11-20 11:51:28
On Sun, 19 Nov 1995, Al Gilman wrote:

  are equivalent. And we should consider whether

    Content-Location: "url"

  should be written using Content-Disposition, e.g., with:

    Content-Disposition: inline; uri="url"

  since the location presented is not *actually* the location of the
  data, but rather the header says that you should treat the embedded
  content AS IF it were at that location.

That was not the way I intended it when I wrote the IETF draft.
My intention was that "Content-Location" should only be used when
it includes a real URL/URI which can be used to really retrieve
the content. In cases where the content is not available via
remote access, only as part of the e-mail message, then my
intention was to use the "file name" or "cid" method.

Hmm. Well, it isn't entirely clear in Jacob's draft how this particular URL is
supposed to be used. It talks about using it to retrieve stuff, but there is
also a fair amount of text to support use as a kind of treatment specifier also.
It also seems to be that in the event that retrieval is not possible it pretty
much has to be used as a treatment specifier.

Using content-disposition for URL information is an interesting idea, though. I
kind of like it for the "as if it were here" treatment specifier case, in that
the content-disposition field is intended to provide information useful in
processing a part. This usage of a URL is being used to specify handling of a
sort, so its a pretty good fit.

I confess I don't understand the retrieval case fully. An indirect reference
needs to be done via an external-body content-type, not by adding an extra
header. We already have a proposal on the table defining such a scheme. (I will
be posting an updated version of this right after I finish this response.)

Now, there is also discussion referring to this as a "more recent" version that
should be used to update the content if possible. The semantics of this aren't
clear to me, nor is the need for such a thing clear, especially when existing
mechanisms (i.e. multipart/alternative) could be used to achieve the same
effect, albeit with a far uglier structure. For example, one semantic issue
arises when the type of the referenced object doesn't agree with the type of the
object in the message, or when HTTP is involved and multiple different versions
of the same object are available, some of which match the type of the message
object and some of which do not. This can get really messy.

One major syntax issue does arise, however, in all of these proposals. Embedding
URLs in message header fields brings up some intersting issues in regards to
line folding. (HTTP may not be concerned with this, but email applications
definitely are.)  URLs can be quite long, and mailers have to be able to fold
them. This is especially true if the URL is just one parameter value,
potentially one of many in a very long field.

This issue came up in the URL access type proposal, and I "solved" it as
follows:

  Syntax and Use of the URL parameter

  Using the ANBF notations and definitions of RFC 822 and RFC 1521, the
  syntax of the URL parameter Is as follows:

       URL-parameter := <"> URL-word *(*LWSP-char URL-word) <">
                  
       URL-word := token
                   ; Must not exceed 40 characters in length

  The syntax of an actual URL string is given in RFC 1738.  URL strings
  can be of any length and can contain arbitrary character content. 
  This presents problems when URLs are embedded in MIME body part
  headers that are wrapped according to RFC 822 rules. For this reason
  they are transformed into a URL-parameter for inclusion in a
  message/external-body content-type specification as follows:

  A check is made to make sure that all occurrences of SPACE, CTLs,
  double quotes, backslashes, and 8-bit characters in the URL string are
  already encoded using the URL encoding scheme specified in RFC 1738.
  Any unencoded occurrences of these characters must be encoded.  Note
  that the result of this operation is nothing more than a different
  representation of the original URL.

  The resulting URL string is broken up into substrings of 40 characters
  or less.

  Each substring is placed in a URL-parameter string as a URL-word,
  separated by one or more spaces.  Note that the enclosing quotes are
  always required since all URLs contain one or more colons, and colons
  are tspecial characters [RFC 1521].

  Extraction of the URL string from the URL-parameter is even simpler:
  The enclosing quotes and any linear whitespace are removed and the
  remaining material is the URL string.

Now, I'm not particularly wedded to this scheme, and I would be happy to change
it to anything better that someone else proposes. (This is still a draft
document, remember.) Nevertheless, the wrapping issue does need to be dealt with
one way or another. It cannot be ignored.  And it also seems to me that it would
be best to deal with it in a consistent fashion everywhere.

                                Ned