Re: What I see as problems to solve ... and a strawman solution


paullocal(_at_)pscs(_dot_)co(_dot_)uk (Paul Smith)  wrote on 30.01.04 in 
<6(_dot_)0(_dot_)0(_dot_)22(_dot_)2(_dot_)20040130122733(_dot_)1b072278(_at_)lmail(_dot_)pscs(_dot_)co(_dot_)uk·2>:

No. I don't like XML when it's not needed.
I would see this as overcomplicating. Personally I don't have much of a
problem with header formats, envelopes, and header/body separation at the
moment.


This is strictly aimed at headers (I certainly don't see it as useful to  
put the bodies in there), and my argument is that XML is *vastly* simpler  
than what we have today with mail headers. Which is a rather  
uncomplimentary thing to say about current mail headers.

As far as I can make out, the RFC-covered area currently has three  
(entirely) different "general" technologies to describe small amounts of  
heterogenous data: the RFC-822 family, XML, and ASN.1 (for example in  
SNMP).

Of those, RFC 822 started out simple but has by now acquired so many  
layers of complications that it is anything but, and is IMHO a main reason  
we are here in the first place. So strike that. (It's still useful in  
areas where you can ignore all those complications, of course. For  
example, the Debian package control file format uses that general syntax,  
but there's no MIME stuff, no need to use different character sets, no  
need to support various different email syntaxes including groups, no  
comments ... mainly field names, field contents, and (in one field) line  
wrapping. I just fail to see how we could ever get back there with mail.)

ASN.1 is pretty far from where we are now (but admittedly it is what  
X.400, once the "other" mail format, used), and it seems to be much harder  
to compatibly expand - given that's where our problems with 822 came from,  
that doesn't sound like a good choice.

XML - while still more complicated than one might wish - certainly has  
support for everything we might want to do. It is a text format, just like  
822. And it is, by now, very widely deployed - it is hard to imagine  
anyone needing to manipulate mail who does not have access to at least one  
XML toolkit.

And while XML, again, still seems more complicated as necessary, it is  
significantly less complicated than current 822+friends, plus it is fairly  
easy to make it even less complicated just by chosing some rules on how to  
use it, which as far as I can see we're certainly entitled to do. We don't  
*have* to use all the options, so long as what we do use still follows the  
rules.

For example, we can say that only one character set is allowed; that only  
certain attributes are allowed (for example, those specifying language),  
and everything else has to be done with tags and text; that only one date  
format is supported; that XML comments are not allowed; and so on. This  
should then even make it feasible to write do-it-yourself XML parsing and  
generating code - not that I would necessarily recommend that.

MfG Kai

PS. I just remembered one extra detail that seems relevant to my strawman  
proposal.

The original Content-Length: was in the middle of the header; even in XML  
it would require parsing the header first.

Much better to start every header with a line containing the length of the  
entity.

That means every entity would be
digit+ CRLF XMLheader body
where the body starts immediately after the closing XML tag (or one could  
insert a CRLF there, too) and the total number of bytes after that first  
CRLF is the number encoded before it.

Makes it fairly trivial to seek around in a mail.