On October 31, 2000 at 13:10, "Mathias K rber" wrote:
My requirements:
a) The system should be able to filter out numerous duplicate
emails. I know I could use formail -D for this. Does MHonARC
have a native detector for duplicate emails? It would be nice
if it could detect even duplicates that differ in their M-ID
(eg remails etc)
On a per archive basis, MHonArc uses message-ids to detect duplicates.
As for the last questions of the paragraph, this requires heuristics
that can be a real pain to do, and never be perfect.
b) The main requirement is that the archive INDEX is accessible
easily, so I guess it will have to reside on my server's HD
somewhere.
The index should be able to be stored on CD-R (or RW) for
mobility and backup purposes.
Minimal indexing requirements:
Date
From, To, CC, Bcc,Sender and their X-equivs (real names and add
resses)
Receipient field information is not available on index pages. They
will show up on message pages unless you exclude them.
Subject
Message-ID, References
Attachment filenames
References and attachment filenames are currently not available for
listing on index pages. It can be done, but what the resulting
formating of these values can be a problem since since the represent a
list of values on not a single item.
Some form of free-text indexes for the body would be nice but
I guess it would
a) create a humongous amount of data
b) be difficult to implement w/o also indexing
too common words (be, to, etc)
c) would require some powerful searchengine to
provide a useful interface..
There are several search engines out there. This is outside
of the scope of MHonArc, but many users hook in search engines for
their archives. Examples: htdig, glimpse, namazu.
c) The archive (emails) itself can be stored on multiple CDs
(CD-R or CD-RW). If the mails could be stored in a compressed
format this would be OK with me too.
MHonArc supports gzip output.
It would be nice if the system could split the archive
automatically by date (eg year/month, so they can be put
on CD separately). Mails that were duplicated in different
periods might need a link ?
Requires a pre-processor. Can probably be done with Procmail.
Perl is an options also.
d) Multi-system (Unix, Windows) access to the archive is a must. This
is why I think MHonARC might be the right tool, as HTML can be
read by both systems.
Could restrict choice of search engine. I know Namazu has a win32 version,
but I do not know if the index files are compatible. Have never checked.
A Java-based search engine is an option, but I do not know if any
are available.
e) The archive shold be extensible, so that I can pipe new mails into
MHonArc was designed to add messages to an existing archive.
Is there maybe a better tool than MHonARC to do this?
As is usually the case, it will probably be a combination of tools.
--ewh