Comments on mime-respect


These are my comments on http://www.w3.org/2001/tag/doc/mime-respect.html,
various issues mixed a bit, sorry.
[I have cross-posted ietf-xml-mime(_at_)imc(_dot_)org because some of them are 
relevant
to the recent discussion about the charset paramenter on Content-Type.]

- Headings: Is this a completed finding, or a draft finding?

- "HTTP/1.1 a a response": word duplication

- Overall, it seems difficult to identify what is general architecture,
  and what is the way it is just because it is the way it (mostly) is.

- My understanding is that one origin of the 'charset' parameter was
  that it was useful to invoke different applications for different
  values. That was definitely the case 10 years or so ago when MIME
  was designed. I remember reading my email that way. This has gone away.
  It may happen that in a somewhat similar way, a lot of what we now
  see as different XML types, in need of different applications, may
  go away in a few years.

- Section 4: "The Unicode encoding of a message body (XML document) is
  inconsistent with the value of the charset parameter in the message
  headers."
  - Please replace 'Unicode encoding' with 'character encoding'.
    It would be strange to e.g. call iso-8859-1 an 'Unicode encoding'.
  - Please remove, or reword "XML document", to not give the impression
    that message bodies are always XML documents.
  - I'm not clear why this is in section 4, entitled "Why user agent
    behavior that misrepresents the user is harmful". This is a
    server problem, the user is not in any way misrepresented.

- The big problem with wrong encoding information for XML and other
  documents is not in a server-user context (where the user has
  to be able to read the document, such problems are usually
  discovered very quickly), but with XML sent between machines.
  This probably should be noted.

- The structure of sections 3 and 4 should be improved. It is good
  style to have an introductory paragraph or two before subsection.
  It is confusing to have a few paragraphs in the first subsection
  of the section after a lot of text that is not in subsections.

- "For this reason, servers should only supply a character encoding
   header when there is complete certainty as to the encoding in use.
   Otherwise, an error will cause a perfectly usable representation
   to be rejected by an architecturally sound client."

   Why doesn't the document say e.g. that a mime type should only be
   supplied when there is complete certainty that this type is
   appropriate? Why does this text assume that the XML is 'perfectly
   usable'? It might not be valid, it might be the wrong mime type,
   or it might not have the right 'encoding' attribute.

- "Servers which generate representations MUST NOT generate the charset
   parameter unless there is certainty that the headers are correct.
   When correct, this information can be used by non-XML processors
   to determine authoritatively the character encoding of the XML MIME
   entity."

   How is a server ever going to know, or going to be able to check,
   what the right character encoding is? Making this a requirement
   on the server itself seems inadequate.

- Section 5: "For instance, the http-equiv attribute of the HTML meta
  element is intended for servers (not clients)."
  Please change 'is' to 'was'. In particular with respect to character
  encoding, current practice is that it's used on the client. If you
  think that this should change, you should say so.

- SMIL 2.0 is "outmoded": I would prefer a different word here.
  I strongly agree that what SMIL 2.0 is saying on content types
  is a very bad idea, and I have said so to the SMIL WG (and more
  recently the Voice browser WG, I think). But given the 2001
  date, I don't think 'outmoded' is the right word, because it was
  never in fashion in the first place.

- Section 6: There is advice to server managers and authors. But
  I think we need to go one more step back, to server implementers
  and the default settings when servers are shipped.
  For example, some servers have an easy way to explore configurations
  and check settings. Others don't. Some servers come with default
  configurations that may be suboptimal. For example (not picking on
  it, just because that's the one I know), Apache at
  http://httpd.apache.org/docs-2.0/en/mod/core.html#adddefaultcharset
  says: "AddDefaultCharset On enables Apache's internal default charset
  of iso-8859-1 as required by the directive."
  Also, the default configuration file contains this:
   #
   # Specify a default charset for all pages sent out. This is
   # always a good idea and opens the door for future internationalisation
   # of your web site, should you ever want it. Specifying it as
   # a default does little harm; as the standard dictates that a page
   # is in iso-8859-1 (latin1) unless specified otherwise i.e. you
   # are merely stating the obvious. There are also some security
   # reasons in browsers, related to javascript and URL parsing
   # which encourage you to always set a default char set.
   #
   AddDefaultCharset ISO-8859-1

  This seems to be 180 degrees opposite to what the TAG is saying.
  It is more about text/html,... than about application/...+xml, but
  there is considerable potential for harm here, too, in particular
  when combined with the default setting that Apache comes with that
  does not allow people managing a directory to override file info.


Regards,     Martin.