paullocal(_at_)pscs(_dot_)co(_dot_)uk (Paul Smith) wrote on 30.01.04 in
<6(_dot_)0(_dot_)0(_dot_)22(_dot_)2(_dot_)20040130122733(_dot_)1b072278(_at_)lmail(_dot_)pscs(_dot_)co(_dot_)uk·2>:
No. I don't like XML when it's not needed.
I would see this as overcomplicating. Personally I don't have much of a
problem with header formats, envelopes, and header/body separation at the
moment.
This is strictly aimed at headers (I certainly don't see it as useful to
put the bodies in there), and my argument is that XML is *vastly* simpler
than what we have today with mail headers. Which is a rather
uncomplimentary thing to say about current mail headers.
As far as I can make out, the RFC-covered area currently has three
(entirely) different "general" technologies to describe small amounts of
heterogenous data: the RFC-822 family, XML, and ASN.1 (for example in
SNMP).
Of those, RFC 822 started out simple but has by now acquired so many
layers of complications that it is anything but, and is IMHO a main reason
we are here in the first place. So strike that. (It's still useful in
areas where you can ignore all those complications, of course. For
example, the Debian package control file format uses that general syntax,
but there's no MIME stuff, no need to use different character sets, no
need to support various different email syntaxes including groups, no
comments ... mainly field names, field contents, and (in one field) line
wrapping. I just fail to see how we could ever get back there with mail.)
ASN.1 is pretty far from where we are now (but admittedly it is what
X.400, once the "other" mail format, used), and it seems to be much harder
to compatibly expand - given that's where our problems with 822 came from,
that doesn't sound like a good choice.
XML - while still more complicated than one might wish - certainly has
support for everything we might want to do. It is a text format, just like
822. And it is, by now, very widely deployed - it is hard to imagine
anyone needing to manipulate mail who does not have access to at least one
XML toolkit.
And while XML, again, still seems more complicated as necessary, it is
significantly less complicated than current 822+friends, plus it is fairly
easy to make it even less complicated just by chosing some rules on how to
use it, which as far as I can see we're certainly entitled to do. We don't
*have* to use all the options, so long as what we do use still follows the
rules.
For example, we can say that only one character set is allowed; that only
certain attributes are allowed (for example, those specifying language),
and everything else has to be done with tags and text; that only one date
format is supported; that XML comments are not allowed; and so on. This
should then even make it feasible to write do-it-yourself XML parsing and
generating code - not that I would necessarily recommend that.
MfG Kai
PS. I just remembered one extra detail that seems relevant to my strawman
proposal.
The original Content-Length: was in the middle of the header; even in XML
it would require parsing the header first.
Much better to start every header with a line containing the length of the
entity.
That means every entity would be
digit+ CRLF XMLheader body
where the body starts immediately after the closing XML tag (or one could
insert a CRLF there, too) and the total number of bytes after that first
CRLF is the number encoded before it.
Makes it fairly trivial to seek around in a mail.