RE: text/xhtml+xml vs. application/xhtml+xml

Rather than trying to channel what the authors of RFC 2046 were thinking, it
seems like we should just ask them.  Specifically, Ned Freed and has been
instrumental in framing the issues we discuss in
<http://www.imc.org/draft-murata-xml>, is (I believe) the IESG member
presenting the draft, and also co-authored RFC 2046.

Ned, my understanding is that text/html is seen in retrospect as problematic
and has led to some of the user unhappiness with MIME, by presenting
information to users that they are unlikely to be able to deal with.  The
referenced paragraph in the draft was meant to capture some of the
discussion on the ietf-xml-mime(_at_)imc(_dot_)org list, though I am on too 
slow a
modem to find the references.  I suspect that the distinction of application
versus text comes down to whether you expect most users to be like us (who
can heuristically recognize HTML, save to a file with the right file
extension, and then open in a browser) versus my mother, who just gets
annoyed with all of the cruft and gives up.

At the end of the day, I would evaluate based on failure scenarios.  I think
people would rather see an attachment than the source text, and are more
likely to be able to recover from the former.

In principal, your suggestion of registering both text/xhtml+xml and
application/xhtml+xml could enable the document author to decide based on
the fallback behavior desired.  However, this is unlikely to work as
expected due to the widespread practice of mapping MIME types to file
extensions, which provides insufficient granularity.  Also, if we are having
trouble evaluating the tradeoffs, it seems unlikely that most document
authors would understand the subtlety.  Finally, I also subscribe to section
3.2 of RFC 1958 on Architectural Principles of the Internet, which says, "If
there are several ways of doing the same thing, choose one."

                - dan

P.S.  Mark, I presume you will keep the HTML WG in the loop on this
discussion.

--
Dan Kohn <mailto:dan(_at_)dankohn(_dot_)com>
<http://www.dankohn.com>  <tel:+1-650-327-2600>

-----Original Message-----
From: Mark Baker [mailto:mark(_dot_)baker(_at_)canada(_dot_)sun(_dot_)com]
Sent: Wednesday, 2000-10-18 11:41
To: Dan Kohn
Cc: xml-mime-types(_at_)imc(_dot_)org
Subject: Re: text/xhtml+xml vs. application/xhtml+xml


(HTML WG BCCd - the new w3.org spam filter makes it impractical to CC
them)

Hi Dan,

Dan Kohn wrote:


Mark, I would appreciate if the HTML WG could provide a little more

context

on their thinking, perhaps by adding to discussion to the eventual XHTML
MIME registration.

First, I'm not convinced that text/ is the correct top-level type.

Section

3 of <http://www.imc.org/draft-murata-xml> says:

   If an XML document -- that is, the unprocessed, source XML document
   -- is readable by casual users, text/xml is preferable to
   application/xml. MIME user agents (and web user agents) that do not
   have explicit support for text/xml will treat it as text/plain, for
   example, by displaying the XML entity as plain text. Application/xml
   is preferable when the XML MIME entity is unreadable by casual
   users. Similarly, text/xml-external-parsed-entity is preferable when
   an external parsed entity is readable by casual users, but
   application/xml-external-parsed-entity is preferable when a plain
   text display is inappropriate.

      NOTE: Users are in general not used to text containing tags such
      as <price>, and often find such tags quite disorienting or
      annoying. If one is not sure, the conservative principle would
      suggest using application/* instead of text/* so as not to put
      information in front of users that they will quite likely not
      understand.


That's interesting.  I guess I hadn't read that section.  Are you
attempting to update RFC 2046 on this subject?

From RFC 2046, Sec 4.1;


"Beyond plain text, there are many formats for representing what might
be known as "rich text". An interesting characteristic of many such
representations is that they are to some extent readable even without
the software that interprets them. It is useful, then, to distinguish
them, at the highest level, from such unreadable data as images, audio,
or text represented in an unreadable form. In the absence of appropriate
interpretation software, it is reasonable to show subtypes of "text" to
the user, while it is not reasonable to do so with most nontextual data.
Such formatted textual data should be represented using subtypes of
"text"."

While this doesn't go into as much depth as draft-murata-xml does, the
HTML WG believes, despite the DOCTYPE/xmlns/HTML-header preamble, that
the bulk (i.e. body) of most XHTML documents will useful, to "some
extent" (per above), to casual users.

It seems like application/* is thus the safer bet.  Moreover, section 2.11
of <http://www.w3.org/TR/REC-xml> already standardizes end-of-line

handling,

so the canonicalization of line endings that text/* supports does not seem
necessary.


True.  That's a small point against text/* handling.  But we feel that
the text/plain fallback is more valuable.

Something we did consider, that we didn't really come to concensus on
(AFAIK) at this morning's call was the possibility of registering both
application/xhtml+xml and text/xhtml+xml, and letting server admins
decide which one wins (or if both are useful).  Any thoughts about that?

Also, I would like to see some detailed discussion of when to use
application/xhtml+xml and when to use text/html.  This seems like an

upward

compatibility challenge of exceeding subtlety, and may deserve more
attention than it received in your IRC conversation.


I'll follow that up in a separate message, hopefully soon.

Thanks in advance for any insight you can provide into your and the WG's
thinking.


No problem.

MB