[Top] [All Lists]

Re: Archives

2004-11-04 09:49:13

being able to store lots of messages in a single file can save a
lot of disk space, particularly if you compress those mbox files.

I suppose that's so in general. However for the problem
domain (a server), disk space with modern hardware
isn't generally an issue.  If it is deemed to be a greater
issue than compression/decompression overhead for a
particular instance, compression can be implemented in
the file system, independent of storage format.

that's much more difficult to do.  the point is to make this easy.

More difficult for whom? 

for the server. 

On an OS that supports compressed
filesystems, FS compression is transparent to applications
(other than performance issues) [and there are OSes that
support FS compression].  On the other hand, if compression
is implemented at the application level, every application that
accesses the files has to be able to use the same type of
compression and decompression, and has to be explicitly
coded to do so. 

yes that's true but it doesn't seem to be a huge problem in practice.

Where multiple access methods are used,
that means that it has to be handled by ftp servers, http
servers, etc. as well as support tools (does "ucbmail" support
editing of compressed mbox files?).

we don't need to concern ourselves with solutions that are 
confined to the server.  what we need to do is define the interface
between the client and the server, such that there is a reasonable
balance of burden between the client and the server.   so if I
were to propose a spec that allowed mail archives to be compressed,
I'd assume that the client was going to do the decompression.
otherwise it's purely a server-side issue and not the client's
concern.  at this point I'm nowhere nearly ready to propose a spec,
I'm just thinking about it.  it's not at all clear to me where
the right tradeoff is yet.

I think one of the biggest practical issues with Archived-At
support is arranging for the URI(s) to be put into the
message before the message is placed in the archive; the
URI (related to archive file name and/or UID) might not
be determinate until the message is actually placed in the

that's fairly easy to solve.  send the messages to the archiver 
before distributing them to list subscribers.

              +----------+       +----------+       +----------+
incoming      | incoming |       |          |       |   list   |
message ----> |   mail   | ----> | archiver | ----> | expander |
              |  filter  |       |          |       |          |
              +----------+       +----------+       +----------+
                    |                  |                  |
                    V                  V                  V
                  rejects          archived          subscribers

That doesn't address the issue of how the URIs that are to be put
in Archived-At fields are generated before the archive file path
and/or UID are known, nor how the field gets inserted in the
message.  Sure, list expansion can happen after archival; that's
the easy part.

the archiver determines the archive file path and adds the archived-at
field to the message before sending it to the list expander.

either that or base the name of the archive on some property of
the message that can be derived independently, say the message-id
or a hash of some kind.

"The name of the archive" may well be different for different
access schemes 

because the "name" is a URL, it _will_ be different for different 
access schemes.  I'm expecting that the archiver will have to know
about the access methods that are available.


<Prev in Thread] Current Thread [Next in Thread>