ietf-xml-mime
[Top] [All Lists]

Fwd: Text/xml vs application/xml

2000-03-22 09:45:58
In the W3C XML SIG, Kurt Conrad and I wrote this summary for the 
discussion of XML media types.


Kurt Conrad wrote...
Proposal:

This RFC will introduce both text/xml and application/xml.
Text/xml is recommended for entities that would be meaningful
to a human being without XML processing.  (Thus, text/xml is
always appropriate for external DTD subsets and external
parameter entities.)  Application/xml is recommended for all
others.

Transmission of XML documents encoded in UTF-16 or UCS-2 via
the SMTP protocol is a special case.  For this purpose, we
cannot use text/xml, because of the line termination rule of
MIME.  Application/xml is recommended, instead.  (Note that
the XML PR needs slight revision, if this proposed decision
is accepted.)


Criteria:

RFC 2046 provides the definition of top-level media 
types "text" and "application".  The definition of 
"text" is as below:

3.  Overview Of The Initial Top-Level Media Types [RFC 2046]
  The five discrete top-level media types are:
   (1)   text -- textual information.  The subtype "plain" in
         particular indicates plain text containing no
         formatting commands or directives of any sort. Plain
         text is intended to be displayed "as-is". No special
         software is required to get the full meaning of the
         text, aside from support for the indicated character
         set. Other subtypes are to be used for enriched text in
         forms where application software may enhance the
         appearance of the text, but such software must not be
         required in order to get the general idea of the
         content.  Possible subtypes of "text" thus include any
         word processor format that can be read without
         resorting to software that understands the format.  In
         particular, formats that employ embeddded binary
         formatting information are not considered directly
         readable. A very simple and portable subtype,
         "richtext", was defined in RFC 1341, with a further
         revision in RFC 1896 under the name "enriched".

[snip]
4.1.  Text Media Type [RFC 2046]

  The "text" media type is intended for sending material which is
  principally textual in form.  A "charset" parameter may be used to
  indicate the character set of the body text for "text" subtypes,
  notably including the subtype "text/plain", which is a generic
  subtype for plain text.  Plain text does not provide for or allow
  formatting commands, font attribute specifications, processing
  instructions, interpretation directives, or content markup.  Plain
  text is seen simply as a linear sequence of characters, possibly
  interrupted by line breaks or page breaks.  Plain text may allow the
  stacking of several characters in the same position in the text.
  Plain text in scripts like Arabic and Hebrew may also include
  facilitites that allow the arbitrary mixing of text segments with
  opposite writing directions.

  Beyond plain text, there are many formats for representing what might
  be known as "rich text".  An interesting characteristic of many such
  representations is that they are to some extent readable even without
  the software that interprets them.  It is useful, then, to
  distinguish them, at the highest level, from such unreadable data as
  images, audio, or text represented in an unreadable form. In the
  absence of appropriate interpretation software, it is reasonable to
  show subtypes of "text" to the user, while it is not reasonable to do
  so with most nontextual data. Such formatted textual data should be
  represented using subtypes of "text".

4.1.1.  Representation of Line Breaks [RFC 2046]

[snip]


It is quite clear that most XML documents belong to the 
"text" type.

Meanwhile, the top-level type "application" is defined as
below:

3.  Overview Of The Initial Top-Level Media Types [RFC 2046]
snip
(5)   application -- some other kind of data, typically
         either uninterpreted binary data or information to be
         processed by an application.  The subtype "octet-
         stream" is to be used in the case of uninterpreted
         binary data, in which case the simplest recommended
         action is to offer to write the information into a file
         for the user.  The "PostScript" subtype is also defined
         for the transport of PostScript material.  Other
         expected uses for "application" include spreadsheets,
         data for mail-based scheduling systems, and languages
         for "active" (computational) messaging, and word
         processing formats that are not directly readable.
         Note that security considerations may exist for some
         types of application data, most notably
         "application/PostScript" and any form of active
         messaging.  These issues are discussed later in this
         document.
[snip]
4.5.  Application Media Type [RFC 2046]
  The "application" media type is to be used for discrete data which do
  not fit in any of the other categories, and particularly for data to
  be processed by some type of application program.  This is
  information which must be processed by an application before it is
  viewable or usable by a user.  Expected uses for the "application"
  media type include file transfer, spreadsheets, data for mail-based
  scheduling systems, and languages for "active" (computational)
  material.  (The latter, in particular, can pose security problems
  which must be understood by implementors, and are considered in
  detail in the discussion of the "application/PostScript" media type.)
  For example, a meeting scheduler might define a standard
  representation for information about proposed meeting dates.  An
  intelligent user agent would use this information to conduct a dialog
  with the user, and might then send additional material based on that
  dialog.  More generally, there have been several "active" messaging
  languages developed in which programs in a suitably specialized
  language are transported to a remote location and automatically run
  in the recipient's environment.
  Such applications may be defined as subtypes of the "application"
  media type. This document defines two subtypes:
  octet-stream, and PostScript.
  The subtype of "application" will often be either the name or include
  part of the name of the application for which the data are intended.
  This does not mean, however, that any application program name may be
  used freely as a subtype of "application".


Probably, some XML data belong to this class.  This is 
one reason to introduce application/xml.

Another reason for application/xml is the delivery of XML 
documents in UTF-16 by the SMTP protocol.  RFC 2046 
has a very strict rule for line termination, which makes 
it impossible to use UTF-16.  Although HTTP loosens 
this rule, the SMTP protocol does not.  Thus, the 
only choice is application/xml.


References:

RFC 1896
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1896.txt

RFC 1341
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1341.txt

RFC 2046
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2046.txt



----
MURATA Makoto  muraw3c(_at_)attglobal(_dot_)net

<Prev in Thread] Current Thread [Next in Thread>
  • Fwd: Text/xml vs application/xml, MURATA Makoto <=