I came up with this for a different purpose, but Dan Connolly
suggested it might be added to an update of RFC 2376, as a
quick overview:
-----
There are three basic situations:
- XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):
- Charset parameter is strongly recommended
- If no charset parameter, default is ASCII. The default of iso-8859-1 in
HTTP is explicitly overridden in the specification of the charset
parameter in section 3.1 "Text/xml Registration" of RFC 2376
(http://www.ietf.org/rfc/rfc2376.txt)
- No error handling provisions
- An encoding declaration, if present, is irrelevant, but when saving a
received resource as a file, the correct encoding declaration should
be inserted.
- XML sent as application/xml (or equivalent):
- Charset parameter is strongly recommended, and if present,
it takes precedence.
- If the charset parameter is omited, the rules for XML in static storage
are followed (see below).
- XML in static storage without external metainformation (e.g. file):
- Default is UTF-8, or UTF-16 if there is a BOM
- For other things, there has to be an encoding declaration
- There is some provision for 'error recovery'. What exactly this
means is currently under discussion in the XML Core WG, so that
it can be clarified.
-----
Regards, Martin.
#-#-# Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-# mailto:duerst(_at_)w3(_dot_)org http://www.w3.org/People/D%C3%BCrst