ietf-822
[Top] [All Lists]

Re: Archives

2004-11-04 09:05:40

On Tue November 2 2004 13:04, Keith Moore wrote:

IIRC, SEEN should not change in a shared folder.

Perhaps (I can't find any definitive requirement); but SEEN
isn't the only flag (the "Recent" flag can be changed by the
server itself, even though clients are forbidden from changing
it).
 
being able to store lots of messages in a single file can save a
lot of disk space, particularly if you compress those mbox files.

I suppose that's so in general. However for the problem
domain (a server), disk space with modern hardware
isn't generally an issue.  If it is deemed to be a greater
issue than compression/decompression overhead for a
particular instance, compression can be implemented in
the file system, independent of storage format.

that's much more difficult to do.  the point is to make this easy.

More difficult for whom?  On an OS that supports compressed
filesystems, FS compression is transparent to applications
(other than performance issues) [and there are OSes that
support FS compression].  On the other hand, if compression
is implemented at the application level, every application that
accesses the files has to be able to use the same type of
compression and decompression, and has to be explicitly
coded to do so.  Where multiple access methods are used,
that means that it has to be handled by ftp servers, http
servers, etc. as well as support tools (does "ucbmail" support
editing of compressed mbox files?).
  
I think one of the biggest practical issues with Archived-At
support is arranging for the URI(s) to be put into the
message before the message is placed in the archive; the
URI (related to archive file name and/or UID) might not
be determinate until the message is actually placed in the
archive.

that's fairly easy to solve.  send the messages to the archiver 
before distributing them to list subscribers.


              +----------+       +----------+       +----------+
incoming      | incoming |       |          |       |   list   |
message ----> |   mail   | ----> | archiver | ----> | expander |
              |  filter  |       |          |       |          |
              +----------+       +----------+       +----------+
                    |                  |                  |
                    V                  V                  V
                  rejects          archived          subscribers
                                   messages

That doesn't address the issue of how the URIs that are to be put
in Archived-At fields are generated before the archive file path
and/or UID are known, nor how the field gets inserted in the
message.  Sure, list expansion can happen after archival; that's
the easy part.

either that or base the name of the archive on some property of
the message that can be derived independently, say the message-id
or a hash of some kind.

"The name of the archive" may well be different for different
access schemes (and one would like to use the same underlying
files to avoid duplicate (triplicate, etc.) storage of messages).
It seems that either the final location, file name, and UID (for
POP and IMAP access) would either need to be determinate
before the files are placed in the archive, or the Archived-At
field would have to be generated and inserted in the messages
in situ after they're placed in the archive (leaving a time
window during which the messages appear (e.g. to users
browsing a folder) but do not yet have an Archived-At field).

Different schemes have different restrictions on the characters
allowable in UIDs and pathnames; also in lengths of identifiers,
etc., which would have to be taken into consideration in any
method that used message-ids or hashes. For example, POP
UIDs are limited in length to a maximum of 70 octets from a
specific set of characters (RFC 1939); many of those characters
are reserved or otherwise excluded from use in http or ftp URIs.


<Prev in Thread] Current Thread [Next in Thread>