Re: text/xhtml+xml vs. application/xhtml+xml

(HTML WG BCCd - the new w3.org spam filter makes it impractical to CC
them.  Also, redirected to ietf-xml-mime(_at_)imc(_dot_)org, as you had sent it
to xml-mime-types(_at_)imc(_dot_)org)

Hi Dan,

Dan Kohn wrote:


Mark, I would appreciate if the HTML WG could provide a little more context
on their thinking, perhaps by adding to discussion to the eventual XHTML
MIME registration.

First, I'm not convinced that text/ is the correct top-level type.  Section
3 of <http://www.imc.org/draft-murata-xml> says:

   If an XML document -- that is, the unprocessed, source XML document
   -- is readable by casual users, text/xml is preferable to
   application/xml. MIME user agents (and web user agents) that do not
   have explicit support for text/xml will treat it as text/plain, for
   example, by displaying the XML entity as plain text. Application/xml
   is preferable when the XML MIME entity is unreadable by casual
   users. Similarly, text/xml-external-parsed-entity is preferable when
   an external parsed entity is readable by casual users, but
   application/xml-external-parsed-entity is preferable when a plain
   text display is inappropriate.

      NOTE: Users are in general not used to text containing tags such
      as <price>, and often find such tags quite disorienting or
      annoying. If one is not sure, the conservative principle would
      suggest using application/* instead of text/* so as not to put
      information in front of users that they will quite likely not
      understand.


That's interesting.  I guess I hadn't read that section.  Are you
attempting to update RFC 2046 on this subject?

From RFC 2046, Sec 4.1;


"Beyond plain text, there are many formats for representing what might
be known as "rich text". An interesting characteristic of many such
representations is that they are to some extent readable even without
the software that interprets them. It is useful, then, to distinguish
them, at the highest level, from such unreadable data as images, audio,
or text represented in an unreadable form. In the absence of appropriate
interpretation software, it is reasonable to show subtypes of "text" to
the user, while it is not reasonable to do so with most nontextual data.
Such formatted textual data should be represented using subtypes of
"text"."

While this doesn't go into as much depth as draft-murata-xml does, the
HTML WG believes, despite the DOCTYPE/xmlns/HTML-header preamble, that
the bulk (i.e. body) of most XHTML documents will useful, to "some
extent" (per above), to casual users.

It seems like application/* is thus the safer bet.  Moreover, section 2.11
of <http://www.w3.org/TR/REC-xml> already standardizes end-of-line handling,
so the canonicalization of line endings that text/* supports does not seem
necessary.


True.  That's a small point against text/* handling.  But we feel that
the text/plain fallback is more valuable.

Something we did consider, that we didn't really come to concensus on
(AFAIK) at this morning's call was the possibility of registering both
application/xhtml+xml and text/xhtml+xml, and letting server admins
decide which one wins (or if both are useful).  Any thoughts about that?

Also, I would like to see some detailed discussion of when to use
application/xhtml+xml and when to use text/html.  This seems like an upward
compatibility challenge of exceeding subtlety, and may deserve more
attention than it received in your IRC conversation.


I'll follow that up in a separate message, hopefully soon.

Thanks in advance for any insight you can provide into your and the WG's
thinking.


No problem.

MB