In the W3C XML SIG, Kurt Conrad and I wrote this summary for the
discussion of XML media types.
Kurt Conrad wrote...
Proposal:
This RFC will introduce both text/xml and application/xml.
Text/xml is recommended for entities that would be meaningful
to a human being without XML processing. (Thus, text/xml is
always appropriate for external DTD subsets and external
parameter entities.) Application/xml is recommended for all
others.
Transmission of XML documents encoded in UTF-16 or UCS-2 via
the SMTP protocol is a special case. For this purpose, we
cannot use text/xml, because of the line termination rule of
MIME. Application/xml is recommended, instead. (Note that
the XML PR needs slight revision, if this proposed decision
is accepted.)
Criteria:
RFC 2046 provides the definition of top-level media
types "text" and "application". The definition of
"text" is as below:
3. Overview Of The Initial Top-Level Media Types [RFC 2046]
The five discrete top-level media types are:
(1) text -- textual information. The subtype "plain" in
particular indicates plain text containing no
formatting commands or directives of any sort. Plain
text is intended to be displayed "as-is". No special
software is required to get the full meaning of the
text, aside from support for the indicated character
set. Other subtypes are to be used for enriched text in
forms where application software may enhance the
appearance of the text, but such software must not be
required in order to get the general idea of the
content. Possible subtypes of "text" thus include any
word processor format that can be read without
resorting to software that understands the format. In
particular, formats that employ embeddded binary
formatting information are not considered directly
readable. A very simple and portable subtype,
"richtext", was defined in RFC 1341, with a further
revision in RFC 1896 under the name "enriched".
[snip]
4.1. Text Media Type [RFC 2046]
The "text" media type is intended for sending material which is
principally textual in form. A "charset" parameter may be used to
indicate the character set of the body text for "text" subtypes,
notably including the subtype "text/plain", which is a generic
subtype for plain text. Plain text does not provide for or allow
formatting commands, font attribute specifications, processing
instructions, interpretation directives, or content markup. Plain
text is seen simply as a linear sequence of characters, possibly
interrupted by line breaks or page breaks. Plain text may allow the
stacking of several characters in the same position in the text.
Plain text in scripts like Arabic and Hebrew may also include
facilitites that allow the arbitrary mixing of text segments with
opposite writing directions.
Beyond plain text, there are many formats for representing what might
be known as "rich text". An interesting characteristic of many such
representations is that they are to some extent readable even without
the software that interprets them. It is useful, then, to
distinguish them, at the highest level, from such unreadable data as
images, audio, or text represented in an unreadable form. In the
absence of appropriate interpretation software, it is reasonable to
show subtypes of "text" to the user, while it is not reasonable to do
so with most nontextual data. Such formatted textual data should be
represented using subtypes of "text".
4.1.1. Representation of Line Breaks [RFC 2046]
[snip]
It is quite clear that most XML documents belong to the
"text" type.
Meanwhile, the top-level type "application" is defined as
below:
3. Overview Of The Initial Top-Level Media Types [RFC 2046]
snip
(5) application -- some other kind of data, typically
either uninterpreted binary data or information to be
processed by an application. The subtype "octet-
stream" is to be used in the case of uninterpreted
binary data, in which case the simplest recommended
action is to offer to write the information into a file
for the user. The "PostScript" subtype is also defined
for the transport of PostScript material. Other
expected uses for "application" include spreadsheets,
data for mail-based scheduling systems, and languages
for "active" (computational) messaging, and word
processing formats that are not directly readable.
Note that security considerations may exist for some
types of application data, most notably
"application/PostScript" and any form of active
messaging. These issues are discussed later in this
document.
[snip]
4.5. Application Media Type [RFC 2046]
The "application" media type is to be used for discrete data which do
not fit in any of the other categories, and particularly for data to
be processed by some type of application program. This is
information which must be processed by an application before it is
viewable or usable by a user. Expected uses for the "application"
media type include file transfer, spreadsheets, data for mail-based
scheduling systems, and languages for "active" (computational)
material. (The latter, in particular, can pose security problems
which must be understood by implementors, and are considered in
detail in the discussion of the "application/PostScript" media type.)
For example, a meeting scheduler might define a standard
representation for information about proposed meeting dates. An
intelligent user agent would use this information to conduct a dialog
with the user, and might then send additional material based on that
dialog. More generally, there have been several "active" messaging
languages developed in which programs in a suitably specialized
language are transported to a remote location and automatically run
in the recipient's environment.
Such applications may be defined as subtypes of the "application"
media type. This document defines two subtypes:
octet-stream, and PostScript.
The subtype of "application" will often be either the name or include
part of the name of the application for which the data are intended.
This does not mean, however, that any application program name may be
used freely as a subtype of "application".
Probably, some XML data belong to this class. This is
one reason to introduce application/xml.
Another reason for application/xml is the delivery of XML
documents in UTF-16 by the SMTP protocol. RFC 2046
has a very strict rule for line termination, which makes
it impossible to use UTF-16. Although HTTP loosens
this rule, the SMTP protocol does not. Thus, the
only choice is application/xml.
References:
RFC 1896
http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1896.txt
RFC 1341
http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1341.txt
RFC 2046
http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2046.txt
----
MURATA Makoto muraw3c(_at_)attglobal(_dot_)net