Re: mailbox format(s)


Iljitsch van Beijnum wrote:


On 27-feb-04, at 21:34, Bruce Lilly wrote:

Not quite. Then the new MUA must download all the messages again from
the server.

Not with IMAP, where the message store typically lives on the
server



Well, then a new client would have to download all the messages from the
server again, wouldn't it?


No. If you're unfamiliar with how IMAP works, you can learn about
it at http://www.imap.org/about/whatisIMAP.html and
http://asg.web.cmu.edu/cyrus/cyrus-overview-TOC.html
Briefly, an IMAP client connecting to an IMAP server's inbox for
a user for the first time would likely ask for a list of message
unique identifiers, then might request specific message header
fields (depending on how the user interface is configured for
display), possibly using a (server-side) search.  Once enough
message header fields are obtained to display a screen of
information to the user (e.g. Subject, sender, size, date),
the client can stop requesting information from the server. A
message body only needs to be retrieved if the user wants to
display the full message body for a particular message.

(which could very well be the same machine -- the point
being to delegate management of the storage to a specific
piece of software other than the UA per se).



Ok I can see that point, but I'm not sure I agree with it. So the
question we need to answer is whether the flexibility in changing the
message storage format by putting this functionality in a separate piece
of software is worth the complexity of having a special API or network
protocol to interact with it.


The description above refers to already existing protocols and
formats.  The question (eventually) will be what sort of protocol(s)
might be used in the next generation of mail systems.  It might
turn out IMAP is completely suitable, or it may need to be tweaked
a bit, or we might need something different. Likewise for POP,
SMTP, LMTP, submission, MUPDATE, and any other mail-related
protocols.

What do you mean here? Are you going to build your own file system that
is optimized for storing large amounts of small files? Current file
systems do a very bad job of this now that cluster sizes are typically
larger than a single 512 byte sector.


Cyrus IMAP works fine with Reiserfs, probably others. There is rarely
any reason to examine every byte of every file using IMAP with
the type of storage described (one file per message). But that's
basically what has to be done with a mailbox-in-a-file (e.g. mbox),
and that's a very slow, inefficient process.  Moving a message
from one mailbox to another is almost trivial using file-per-message.
Doing so with a mailbox-in-a-file means extracting a message, closing
up the hole produced by rewriting the remainder of the file, opening
up a suitable space in another file (again possibly moving some
content) and inserting the message there. All with suitable locking
mechanisms and safeguards against loss in the event of a system crash.

Yes, *today*. With a system that includes message sizes the body can be
skipped


IIRC, that's been tried ("Content-Length" stored in the message
header), and it has led to corrupted mailboxes.

It would be pretty stupid to store messages such that an operation is
done once a month can be performed easily while operations that happen
every minute suffer.


What is the basis for your statement that "operations that happen
every minute suffer"?  Do you have any data on real mail systems to
support that statement?

If the format doesn't allow grepping then I'm sure
someone will write a tool to do this. Besides, current mail clients
allow searching in email as well.


Yes, using IMAP in some cases (remember, that *is* a current,
widely-supported protocol!), or using UA-private copies of every
message in a UA-specific format.  I thought that the latter
situation was what you wanted to get away from.

Anything that displays the text between the tags is a valid HTML
implementation.


You're joking, right?

Obviously lots of bad things can happen when there is a standard.
However, none of these possibilities are a good reason to forego
standardization.


We shouldn't pretend that standardization is a solution to every
problem.  And standardization needs to take place in an appropriate
forum.  The Internet Engineering Task Force is concerned with issues
that relate to interconnected networks, including standardization of
protocols (and that may include message format as it is transmitted
over the network).  File storage is really outside of its scope.
There may be some other organization interested in that issue (maybe
ECMA).