mail-ng
[Top] [All Lists]

Re: mailbox format(s)

2004-02-27 11:23:50

On 27 feb 2004, at 0:26, Bruce Lilly wrote:

I quite strongly agree with Iljitsch on this one.
Having a well-defined format that can be used for files
is very helpful. The biggest advantage would be that it
makes it easier to switch MUAs.

If the mailbox is accessed via a standard network protocol such as
POP or IMAP, it is trivially easy to switch MUAs.

Not quite. Then the new MUA must download all the messages again from the server. Note that ISPs often don't allow customers to use their mailbox for long-term storage, so in practice this doesn't work at all for many users. The alternative would be to run a local IMAP-like service, which seems excessive and still doesn't allow the use of more than one program for downloading mail, just more than one program for displaying it.

A single format might not scale
well; what works for an organization with plenty of resources
might not work at all for a guy with a PDA or cell phone (and
vice versa).

I have more than 300 MB worth of mail on my server. My laptop keeps a complete copy of this, but I don't think that makes sense for a PDA to do the same: just caching the most recently read messages would be a better choice.

The format could come in single-message and multiple-message (like
current mbox) variants. The later should just be a concatenation
of the former, or otherwise a very trivial transform.

Been there, done that. Never again.  A flat file just doesn't
work well with even a modest number of messages.

It can work if you build an index and don't go around removing message from the middle of the file too often.

Cyrus IMAP stores one
message per file, with a database for metadata (access lists,
etc.), and it's quite fast.

Maybe for random access, but if you need to access all messages you're bound to be slower. Also, the file system overhead makes this a pretty bad idea.

What I imagine is a system where messages are stored in a binary container format such as IFF/AIFF/ASF/AVI when they are created. A typical message would start with a header section (which includes the msgid), then a body section consisting of one or more text and/or binary parts and finally an optional signature section. These are concatenated and the whole thing is flagged as immutable in transit. The container format makes sure that it's easy to skip ahead to the part of the message that is of interest at any particular time. Then there are three other parts: an end-to-end control section, a local control section and any information left by intermediate systems. These sections can either be kept in a separate place and be linked to the main section through the use of the msgid, or they can simply be tacked on at the end of it.

A mailbox can then simply consist of a number of these messages that may or may not be concatenated into larger files. The control sections can be split off or copied to a different location and be used as an index if this is desired. But it's not really necessary to do that as searching (on header fields) through a large mailbox is fairly efficient: read the header, skip the body, read the control sections, next message. (The length of each section is specified in the container format so there is no need to parse the whole body and no 8bit cleanliness issues.)

I think both having one message per file and having all messages in one file isn't the best idea, grouping messages in files of a few hundred kB or several MB is probably better. We can add some padding after any embedded control/logging sections (which can just as easily be trailers as headers) so that when those sections grow, space can be borrowed from the padding sections.

Now obviously this is just one idea and I'm not saying an eventual solution must be like this, I'm just trying to show there are more more ways to skin a cat.

On the way out, not worth caring about too much
(as Iljitsch said, nobody should be forced to use
a format).

If it won't work for some systems (remember that bit about
heterogeneity), what is the point of having a standard?

Standards improve quality, because they usually eliminate inferior ways of achieving a result. And often vendors that implement their own solution also support the standard to some extent in order to be compatible.


<Prev in Thread] Current Thread [Next in Thread>