mail-ng
[Top] [All Lists]

Re: Less is more

2004-04-27 14:22:23

On 27-apr-04, at 15:30, Brett Watson wrote:

One of my hopes for mail-ng is that it will be easy
to implement, and not be forgiving about bad implementations.

The usual principle is "Postel's Robustness Principle", which states that an endpoint should be strict in what it sends, but liberal in what it accepts. Some have argued (as you do) that this is not appropriate for an application
protocol, as it encourages sloppy implementation.

What we need are specifications that are not open to more than one interpretation. So either something is correct or it isn't. Obviously there will always be incorrect implementations but if we minimize the wiggle room people will be much more inclined to clean up their act rather than reinterpret the specs so they can claim correctness.

* Use an easily parsed timestamp (my advise: 64-bit number representing
timestamp and offset from UTC)

I applaud the general principle of "easily parsed timestamp", but there's an enormous can of worms waiting to be opened if you want to discuss this in any
detail.

By all means, open the can.  :-)

Let's be sure that we know what purpose the timestamp is being used
for before we discuss the matter in more detail.

Speaking as someone who reads mail, I want to know:

- when the message was sent (local time for the sender)
- when the message was sent (my time)
- when the message was received (my time)

* Use one and only one charset (my advise: utf-8)

Does anyone see any major drawback in using UTF-8 only?

There are two issues to consider: the character set and the encoding. It doesn't make any sense to use any other character set than Unicode. As fro the encoding, there are three typical cases:

1. Latin (7 bit ASCII) characters with the occasional Unicode character
2. Non-latin characters that otherwise fit in 8 bits (Greek, Cyrillic)
3. Non-lating characters that can't be encoded in 8 bits (Chinese et al)

For 1. the obvious encoding would be UTF-8. For 3. a 16 or even 32 bit Unicode encoding would make more sense, but there are many of those so we need to figure out how to handle that. Do we keep the text in the original encoding or do we make encoding rules? This is important with regard to message signing. So this leaves 2., which is problematic. Either we have to use 16 bits for those which is wasteful, or cram everything in 8 bits which is complex. Or maybe supporting a limited set of legacy character sets isn't so bad after all.

This need not preclude other charsets from
being carried in mail, but they would be treated as opaque binary data by the mail protocol. UTF-8 would be the exclusive encoding for parsed elements.

Disagree. Remember the talks we had earlier on this list about non-ASCII header fields, especially the email address? Obviously it's OK to _carry_ non-Unicode text in email as opaque data, but this would be similar to attaching a Word document to a message: you don't know whether the other end can decode it. The range of possible character sets and encodings needs to be as limited as it can be.

* Use an easily implemented envelope (my advise: xml or xml-lookalike,
with data-size attribute for a scheme identical to IMAP4 literals to
prevent a need for escaping)

Discussion of XML is premature until we have some idea what data we need to transfer at particular moments. XML is good for certain types of structured data, but there may be better approaches for very lightweight messages or
heavily binary-oriented messages.

We already had long talks about XML, no need to repeat this (just yet), but let me suffice to say that I really like the idea behind binary container formats such as AVI.

I would like to have a header specifying the jurisdiction the
email is sent by.

This is an excellent example of something that should NOT be in the core email protocols, but should definitely be supportable using extensions to those core specifications for the people who want/need it.


<Prev in Thread] Current Thread [Next in Thread>