ietf-xml-mime
[Top] [All Lists]

Re: Some text that may be useful for the update of RFC 2376

2000-03-13 17:25:31


"Martin J. Duerst" wrote:

I came up with this for a different purpose, but Dan Connolly
suggested it might be added to an update of RFC 2376, as a
quick overview:

Here is my suggested amendment, which removes dubious wiggle room and
weasel words and makes the result purely deterministic:

- XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. 
text/vnd.wap.wml):

as text/"anything" in other words

  - Charset parameter is strongly recommended

Charset parameter is required if the charset is not UTF-8 or UTF-16

  - If no charset parameter, default is ASCII. The default of iso-8859-1 in
    HTTP is explicitly overridden in the specification of the charset
    parameter in section 3.1 "Text/xml Registration" of RFC 2376
    (http://www.ietf.org/rfc/rfc2376.txt)

The charset (not default, but THE charset) is UTF-16 (if BOM) or UTF-8 (if
no BOM) and the "default" of iso-8859-1 in HTTP and US-ASCII in mail is
explicitly overridden ...

  - No error handling provisions
  - An encoding declaration, if present, is irrelevant, but when saving a
    received resource as a file, the correct encoding declaration should
    be inserted.

shall be inserted. 

[if the application claims to save as XML rather than saving as a bunch of
stuff with pointy brackets. If it fails to do so, then the rules for static
storage explains what happens when the file is next parsed - WF error. ]

- XML sent as application/xml (or equivalent):
  - Charset parameter is strongly recommended, and if present,
    it takes precedence.

Charset parameter is *disallowed*.

  - If the charset parameter is omited, the rules for XML in static storage
    are followed (see below).

The rules for XML in static storage are followed. Such files may be freely
saved to static storage without modification in all cases.

- XML in static storage without external metainformation (e.g. file):
  - Default is UTF-8, or UTF-16 if there is a BOM

For files without an explicit encoding declaration, the file is in UTF-16
if there is a BOM and UTF-8 if there is not.

  - For other things, there has to be an encoding declaration
  - There is some provision for 'error recovery'. What exactly this
    means is currently under discussion in the XML Core WG, so that
    it can  be clarified.

"Some provision"????

There is no provision for error recovery, and if a file does not parse for
whatever reason then it shall be a well formedness error.

--
Chris