Re: Content Types


Excerpts from ext.ietf-822: 27-Aug-91 Re: Content Types Ned
Freed(_at_)hmcvax(_dot_)claremo (4009)

[quote 1 -- Ned implies that the type field describes the document class: ]

if I get something
of type image/foo, where I have never heard of foo before, it is nice to
know that it is an image, and that trying to display it on my VT100 is

probably

not going to work


[quote 2 -- Ned implies that the type field describes the standard
encoding class for the document format: ]

PostScript ... even has a binary form (level II) that does not encode
well as text.

I'm sorry to see that the type/sub-type distinction is still there,
because it muddies the waters considerably.


If you think it muddies the waters, you're always free to simply
ignore the distinction. Simply treat all type/subtypes as opaque identifying
strings if you like.

 In particular, the
discussion indicates that people are confused about what the first field
of the `Content-Type' header is supposed to be used for.  Does it
indicate the particular document class of the message body (image,
formatted text, video) or does it indicate the encoding of the format of
that document (G3FAX, LaTeX, PAL)?


It indicates the what you call the class. I prefer to think of it as a
hierarchy and the class is just the first type. I don't think there's
any confusion here.

  If it indicates the former, that
says that some composite format-types, such as Andrew, Interleaf, CDA,
etc., will show up in the sub-type field of several different types;
e.g., Interleaf would be perfectly eligible for either `text-plus' or
`image', depending on what it described.


The grouping is somewhat arbitrary for these composite entities. They
do _not_ appear in two places. Anything appearing in two places is bad.
And proposals to switch the order of things do not change this -- the
fact of the matter is that composite entities are difficult to deal with
no matter what. You don't clean them up any by switching the order, and
you mess up interoperability considerably if you allow things to take on
more than one name.

Some other formats -- such as TeX, scribe, troff -- would show up only
in the `text-plus' field; assuming of course that troff+pic is different
from troff.


No, they all appear in one place. See above.

PostScript is perhaps more straightforward.  It is *simply* an image
format, even when it is sending images of textual documents.  (Yes, yes,
it compresses images of text documents in interesting ways that in some
encodings even allow the text to be retrieved from the image, and that
confuses some people :-).  But in important ways it is *not*
`text-plus-markup' as the true `text-plus' languages are.  However, in
that it is a programming language, it might appear under the
`application' type.


You obviously have not talked to Adobe lately. The next phase of changes to
PostScript are going to be oriented towards making it into a revisable
markup language. Indeed, there are already applications using PostScript
that treat it as a revisable language, sticking to only a subset of the
language to make such revision feasible. There are also applications that
generate PostScript and leave the original text in using a series of pointers
into the PostScript code so that it can be extracted automatically.

PostScript is much more than just an image description language. So say its
designers, and so say many applications that use it.

(This is not an endorsement of PostScript, by the way. I think that it makes
a pretty lousy markup language. But that does not stop people from using it
as one.)

It would be better to regard the `type' subfield of the `Content-type'
header as a field independent of the formatting of the message.  It
would also be wise to regard the format encoding as a field independent
of the format.  In particular, we must *not* assume that a document
format of `Andrew' implies a document class of `text-plus' and we must
*not* assume that a document format of `PostScript' implies a transport
format of 7-bit ASCII.  I'd rather see two fields, `Document-class' and
`Content-type', but combining them into `Content-type' is OK as long as
they are separate in our minds.


Please. This is gloss. The present scheme is a compromise between a complex
mess of headers and packing everything on a single header. You now want to
tip things in one direction. You are in effect proposing that we lose our
compromise for no appreciable gain. Relabelling things does not change them.

We could invent a new top-level type that is used to represent things that
are composites, like PostScript, CDA, ODA, and Interleaf. However, I prefer
not to do this, but simply group each of these entities under the type that
it is most likely to be associated with.

One would certainly hope and expect that the use of composites for
documents will increase dramatically in the reasonably near future. 
Most of the non-UNIX world has been doing it for a few years now, and as
standards such as SGML, and products such as Interleaf and FrameMaker
proliferate in the UNIX world (and character-oriented TTYs die out),
UNIX users will join them.  It would be very short-sighted of us not to
have a document class for these things.


I suppose it is short-sighted of us not to have a type for smells and feelings,
too. The present limited list of top-level types is a compromise. There are
people that want about 10 additional ones and there are people who want about
4 fewer. My own bias is to add a couple but I can live with what we have now.
I would object to eliminating any of the types we have at present. The group
that wanted fewer could live with the 9 we have, but no more than that.

I don't have a problem with putting the various composite document formats
under the "next best" types.

                                        Ned