Re: Less is more


On 27-apr-04, at 15:30, Brett Watson wrote:

One of my hopes for mail-ng is that it will be easy
to implement, and not be forgiving about bad implementations.

The usual principle is "Postel's Robustness Principle", which statesthat anendpoint should be strict in what it sends, but liberal in what itaccepts.Some have argued (as you do) that this is not appropriate for anapplication
protocol, as it encourages sloppy implementation.

What we need are specifications that are not open to more than oneinterpretation. So either something is correct or it isn't. Obviouslythere will always be incorrect implementations but if we minimize thewiggle room people will be much more inclined to clean up their actrather than reinterpret the specs so they can claim correctness.

* Use an easily parsed timestamp (my advise: 64-bit numberrepresenting
timestamp and offset from UTC)

I applaud the general principle of "easily parsed timestamp", butthere's anenormous can of worms waiting to be opened if you want to discuss thisin any
detail.


By all means, open the can.  :-)

Let's be sure that we know what purpose the timestamp is being used
for before we discuss the matter in more detail.


Speaking as someone who reads mail, I want to know:

- when the message was sent (local time for the sender)
- when the message was sent (my time)
- when the message was received (my time)

* Use one and only one charset (my advise: utf-8)

Does anyone see any major drawback in using UTF-8 only?

There are two issues to consider: the character set and the encoding.It doesn't make any sense to use any other character set than Unicode.As fro the encoding, there are three typical cases:


1. Latin (7 bit ASCII) characters with the occasional Unicode character
2. Non-latin characters that otherwise fit in 8 bits (Greek, Cyrillic)
3. Non-lating characters that can't be encoded in 8 bits (Chinese et al)

For 1. the obvious encoding would be UTF-8. For 3. a 16 or even 32 bitUnicode encoding would make more sense, but there are many of those sowe need to figure out how to handle that. Do we keep the text in theoriginal encoding or do we make encoding rules? This is important withregard to message signing. So this leaves 2., which is problematic.Either we have to use 16 bits for those which is wasteful, or crameverything in 8 bits which is complex. Or maybe supporting a limitedset of legacy character sets isn't so bad after all.

This need not preclude other charsets from
being carried in mail, but they would be treated as opaque binary databy themail protocol. UTF-8 would be the exclusive encoding for parsedelements.

Disagree. Remember the talks we had earlier on this list aboutnon-ASCII header fields, especially the email address? Obviously it'sOK to _carry_ non-Unicode text in email as opaque data, but this wouldbe similar to attaching a Word document to a message: you don't knowwhether the other end can decode it. The range of possible charactersets and encodings needs to be as limited as it can be.

* Use an easily implemented envelope (my advise: xml or xml-lookalike,
with data-size attribute for a scheme identical to IMAP4 literals to
prevent a need for escaping)

Discussion of XML is premature until we have some idea what data weneed totransfer at particular moments. XML is good for certain types ofstructureddata, but there may be better approaches for very lightweight messagesor
heavily binary-oriented messages.

We already had long talks about XML, no need to repeat this (just yet),but let me suffice to say that I really like the idea behind binarycontainer formats such as AVI.

I would like to have a header specifying the jurisdiction the
email is sent by.

This is an excellent example of something that should NOT be in thecore email protocols, but should definitely be supportable usingextensions to those core specifications for the people who want/needit.