On 27-apr-04, at 15:30, Brett Watson wrote:
One of my hopes for mail-ng is that it will be easy
to implement, and not be forgiving about bad implementations.
The usual principle is "Postel's Robustness Principle", which states
that an
endpoint should be strict in what it sends, but liberal in what it
accepts.
Some have argued (as you do) that this is not appropriate for an
application
protocol, as it encourages sloppy implementation.
What we need are specifications that are not open to more than one
interpretation. So either something is correct or it isn't. Obviously
there will always be incorrect implementations but if we minimize the
wiggle room people will be much more inclined to clean up their act
rather than reinterpret the specs so they can claim correctness.
* Use an easily parsed timestamp (my advise: 64-bit number
representing
timestamp and offset from UTC)
I applaud the general principle of "easily parsed timestamp", but
there's an
enormous can of worms waiting to be opened if you want to discuss this
in any
detail.
By all means, open the can. :-)
Let's be sure that we know what purpose the timestamp is being used
for before we discuss the matter in more detail.
Speaking as someone who reads mail, I want to know:
- when the message was sent (local time for the sender)
- when the message was sent (my time)
- when the message was received (my time)
* Use one and only one charset (my advise: utf-8)
Does anyone see any major drawback in using UTF-8 only?
There are two issues to consider: the character set and the encoding.
It doesn't make any sense to use any other character set than Unicode.
As fro the encoding, there are three typical cases:
1. Latin (7 bit ASCII) characters with the occasional Unicode character
2. Non-latin characters that otherwise fit in 8 bits (Greek, Cyrillic)
3. Non-lating characters that can't be encoded in 8 bits (Chinese et al)
For 1. the obvious encoding would be UTF-8. For 3. a 16 or even 32 bit
Unicode encoding would make more sense, but there are many of those so
we need to figure out how to handle that. Do we keep the text in the
original encoding or do we make encoding rules? This is important with
regard to message signing. So this leaves 2., which is problematic.
Either we have to use 16 bits for those which is wasteful, or cram
everything in 8 bits which is complex. Or maybe supporting a limited
set of legacy character sets isn't so bad after all.
This need not preclude other charsets from
being carried in mail, but they would be treated as opaque binary data
by the
mail protocol. UTF-8 would be the exclusive encoding for parsed
elements.
Disagree. Remember the talks we had earlier on this list about
non-ASCII header fields, especially the email address? Obviously it's
OK to _carry_ non-Unicode text in email as opaque data, but this would
be similar to attaching a Word document to a message: you don't know
whether the other end can decode it. The range of possible character
sets and encodings needs to be as limited as it can be.
* Use an easily implemented envelope (my advise: xml or xml-lookalike,
with data-size attribute for a scheme identical to IMAP4 literals to
prevent a need for escaping)
Discussion of XML is premature until we have some idea what data we
need to
transfer at particular moments. XML is good for certain types of
structured
data, but there may be better approaches for very lightweight messages
or
heavily binary-oriented messages.
We already had long talks about XML, no need to repeat this (just yet),
but let me suffice to say that I really like the idea behind binary
container formats such as AVI.
I would like to have a header specifying the jurisdiction the
email is sent by.
This is an excellent example of something that should NOT be in the
core email protocols, but should definitely be supportable using
extensions to those core specifications for the people who want/need
it.