On October 31, 2000 at 13:10, "Mathias K rber" wrote:
a) The system should be able to filter out numerous duplicate
emails. I know I could use formail -D for this. Does MHonARC
have a native detector for duplicate emails? It would be nice
if it could detect even duplicates that differ in their M-ID
(eg remails etc)
On a per archive basis, MHonArc uses message-ids to detect duplicates.
As for the last questions of the paragraph, this requires heuristics
that can be a real pain to do, and never be perfect.
b) The main requirement is that the archive INDEX is accessible
easily, so I guess it will have to reside on my server's HD
The index should be able to be stored on CD-R (or RW) for
mobility and backup purposes.
Minimal indexing requirements:
From, To, CC, Bcc,Sender and their X-equivs (real names and add
Receipient field information is not available on index pages. They
will show up on message pages unless you exclude them.
References and attachment filenames are currently not available for
listing on index pages. It can be done, but what the resulting
formating of these values can be a problem since since the represent a
list of values on not a single item.
Some form of free-text indexes for the body would be nice but
I guess it would
a) create a humongous amount of data
b) be difficult to implement w/o also indexing
too common words (be, to, etc)
c) would require some powerful searchengine to
provide a useful interface..
There are several search engines out there. This is outside
of the scope of MHonArc, but many users hook in search engines for
their archives. Examples: htdig, glimpse, namazu.
c) The archive (emails) itself can be stored on multiple CDs
(CD-R or CD-RW). If the mails could be stored in a compressed
format this would be OK with me too.
MHonArc supports gzip output.
It would be nice if the system could split the archive
automatically by date (eg year/month, so they can be put
on CD separately). Mails that were duplicated in different
periods might need a link ?
Requires a pre-processor. Can probably be done with Procmail.
Perl is an options also.
d) Multi-system (Unix, Windows) access to the archive is a must. This
is why I think MHonARC might be the right tool, as HTML can be
read by both systems.
Could restrict choice of search engine. I know Namazu has a win32 version,
but I do not know if the index files are compatible. Have never checked.
A Java-based search engine is an option, but I do not know if any
e) The archive shold be extensible, so that I can pipe new mails into
MHonArc was designed to add messages to an existing archive.
Is there maybe a better tool than MHonARC to do this?
As is usually the case, it will probably be a combination of tools.