Re: mailbox format(s)
2004-02-27 11:23:50
On 27 feb 2004, at 0:26, Bruce Lilly wrote:
I quite strongly agree with Iljitsch on this one.
Having a well-defined format that can be used for files
is very helpful. The biggest advantage would be that it
makes it easier to switch MUAs.
If the mailbox is accessed via a standard network protocol such as
POP or IMAP, it is trivially easy to switch MUAs.
Not quite. Then the new MUA must download all the messages again from
the server. Note that ISPs often don't allow customers to use their
mailbox for long-term storage, so in practice this doesn't work at all
for many users. The alternative would be to run a local IMAP-like
service, which seems excessive and still doesn't allow the use of more
than one program for downloading mail, just more than one program for
displaying it.
A single format might not scale
well; what works for an organization with plenty of resources
might not work at all for a guy with a PDA or cell phone (and
vice versa).
I have more than 300 MB worth of mail on my server. My laptop keeps a
complete copy of this, but I don't think that makes sense for a PDA to
do the same: just caching the most recently read messages would be a
better choice.
The format could come in single-message and multiple-message (like
current mbox) variants. The later should just be a concatenation
of the former, or otherwise a very trivial transform.
Been there, done that. Never again. A flat file just doesn't
work well with even a modest number of messages.
It can work if you build an index and don't go around removing message
from the middle of the file too often.
Cyrus IMAP stores one
message per file, with a database for metadata (access lists,
etc.), and it's quite fast.
Maybe for random access, but if you need to access all messages you're
bound to be slower. Also, the file system overhead makes this a pretty
bad idea.
What I imagine is a system where messages are stored in a binary
container format such as IFF/AIFF/ASF/AVI when they are created. A
typical message would start with a header section (which includes the
msgid), then a body section consisting of one or more text and/or
binary parts and finally an optional signature section. These are
concatenated and the whole thing is flagged as immutable in transit.
The container format makes sure that it's easy to skip ahead to the
part of the message that is of interest at any particular time. Then
there are three other parts: an end-to-end control section, a local
control section and any information left by intermediate systems. These
sections can either be kept in a separate place and be linked to the
main section through the use of the msgid, or they can simply be tacked
on at the end of it.
A mailbox can then simply consist of a number of these messages that
may or may not be concatenated into larger files. The control sections
can be split off or copied to a different location and be used as an
index if this is desired. But it's not really necessary to do that as
searching (on header fields) through a large mailbox is fairly
efficient: read the header, skip the body, read the control sections,
next message. (The length of each section is specified in the container
format so there is no need to parse the whole body and no 8bit
cleanliness issues.)
I think both having one message per file and having all messages in one
file isn't the best idea, grouping messages in files of a few hundred
kB or several MB is probably better. We can add some padding after any
embedded control/logging sections (which can just as easily be trailers
as headers) so that when those sections grow, space can be borrowed
from the padding sections.
Now obviously this is just one idea and I'm not saying an eventual
solution must be like this, I'm just trying to show there are more more
ways to skin a cat.
On the way out, not worth caring about too much
(as Iljitsch said, nobody should be forced to use
a format).
If it won't work for some systems (remember that bit about
heterogeneity), what is the point of having a standard?
Standards improve quality, because they usually eliminate inferior ways
of achieving a result. And often vendors that implement their own
solution also support the standard to some extent in order to be
compatible.
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- RE: SMTP and multicasting, (continued)
- Re: mailbox format(s), Martin Duerst
- Re: mailbox format(s), Bruce Lilly
- Re: mailbox format(s), Brett Watson
- Re: mailbox format(s),
Iljitsch van Beijnum <=
- Re: mailbox format(s), Bruce Lilly
- Re: mailbox format(s), Iljitsch van Beijnum
- Re: mailbox format(s), Bruce Lilly
- Re: mailbox format(s), Iljitsch van Beijnum
- Re: mailbox format(s), Richard Welty
- Goal: easier cpu parsable opt out tags, Doug Royer
- Re: Goal: easier cpu parsable opt out tags, Brett Watson
- Re: mailbox format(s), Bruce Lilly
- Re: SMTP and multicasting, Hadmut Danisch
|
|
|