ietf-xml-mime
[Top] [All Lists]

Re: Some text that may be useful for the update of RFC 2376

2000-03-15 09:07:26
At 10:39 PM 3/15/00 +0800, Rick Jelliffe wrote:
What about this?

      1) In all cases, charset parameter is required.
      There is no default. Failure is an unrecoverable
      error, for general applications. Detection is
      mandatory.

I hate this but it's hard to disagree with.  text/xml is the one format
where you can, in a high proportion of cases, send it without worrying
too much about the encoding and The Right Thing Will Happen.  But I gather
that doing so amounts to an assertion that the encoding is US-ASCII 
[unless we can figure out how to dodge RFC2046].  Clearly this is 
intolerable.  

<surprised>Hold on a second!  I just went and read RFC 2046, section
4.1.2, and it seems to me that the US-ASCII default is only compulsory
for text/plain!  I quote the text:

   A critical parameter that may be specified in the Content-Type field
   for "text/plain" data is the character set.  This is specified with a
   "charset" parameter, as in:

     Content-type: text/plain; charset=iso-8859-1

   Unlike some other parameter values, the values of the charset
   parameter are NOT case sensitive.  The default character set, which
   must be assumed in the absence of a charset parameter, is US-ASCII.

   The specification for any future subtypes of "text" must specify
   whether or not they will also utilize a "charset" parameter, and may
   possibly restrict its values as well.  For other subtypes of "text"
   than "text/plain", the semantics of the "charset" parameter should be
   defined to be identical to those specified here for "text/plain",
   i.e., the body consists entirely of characters in the given charset.
   In particular, definers of future "text" subtypes should pay close
   attention to the implications of multioctet character sets for their
   subtype definitions.

Followed by many pages of discussion of the meaning of ASCII and life.
</surprised>

So: are we really *really* sure that we have to default to US-ASCII,
or <important>that we have to default at all</important>?  

      2) In all cases, all code sequences in
      the document must match code sequences allowed
      by the encoding specified by the charset parameter.
      Failure is an unrecoverable error, for general
      applications. Detection is not mandatory.
      
      3) In all cases, if the document starts with a BOM,
      the charset parameter must indicate which flavour
      of UTF-16 is being used. There is no default.
      Failure is an unrecoverable error, for general
      applications. Detection is not mandatory, but should
      be made so at some future date.
... 
The reason for 3) is that, as Murata-san's proposed
Japanese Profile of XML makes clear, there are Japanese flavours
of Unicode floating about. So just relying on the BOM is not
satisfactory. 

This is going too far.  If there are bogus things in Japan claiming
to be UTF-16 when they're really not, we should not visibly strain the
architecture to accomodate them.  The BOM is in practice a highly robust
and efficient mechanism; telling server applications that they must
distinguish BE and LE flavors is an order that is unlikely to be followed,
and if followed, unlikely to be implemented correctly.  Which is 
especially since it serves no useful purpose whatsoever.

      4) If the document is sent text/xml, the encoding
      parameter of the XML header is not checked. However,
      well-behaved systems should rewrite the encoding
      attribute of the XML header to agree with charset 
      parameter. 

Er, "encoding parameter of the XML header", you mean the encoding
declaration?  You need to say "by the transfer agent"; end-user software
should certainly feel free to check this, to catch breakage at the
transfer level.

      5) If the data is sent application/xml then
      the charset parameter must agree with the
      encoding attribute of the XML header. Failure is
      an unrecoverable error, for general applications.
      Detection is not mandatory.

Same points as above.  Why must this be supplied?  -Tim