mhonarc-users

Re: Is MHonARC for me?

2000-11-01 14:34:18
On October 31, 2000 at 13:10, "Mathias K rber" wrote:

My requirements:
      a) The system should be able to filter out numerous duplicate
         emails. I know I could use formail -D for this. Does MHonARC
         have a native detector for duplicate emails? It would be nice
         if it could detect even duplicates that differ in their M-ID
         (eg remails etc)

On a per archive basis, MHonArc uses message-ids to detect duplicates.
As for the last questions of the paragraph, this requires heuristics
that can be a real pain to do, and never be perfect.

      b) The main requirement is that the archive INDEX is accessible
         easily, so I guess it will have to reside on my server's HD
         somewhere.
         The index should be able to be stored on CD-R (or RW) for
         mobility and backup purposes.

         Minimal indexing requirements:
              Date
              From, To, CC, Bcc,Sender and their X-equivs (real names and add
resses)

Receipient field information is not available on index pages.  They
will show up on message pages unless you exclude them.

              Subject
              Message-ID, References
              Attachment filenames

References and attachment filenames are currently not available for
listing on index pages.  It can be done, but what the resulting
formating of these values can be a problem since since the represent a
list of values on not a single item.

              Some form of free-text indexes for the body would be nice but 
              I guess it would
                      a) create a humongous amount of data
                      b) be difficult to implement w/o also indexing
                         too common words (be, to, etc)
                      c) would require some powerful searchengine to
                         provide a useful interface..

There are several search engines out there.  This is outside
of the scope of MHonArc, but many users hook in search engines for
their archives.  Examples: htdig, glimpse, namazu.

      c) The archive (emails) itself can be stored on multiple CDs
         (CD-R or CD-RW). If the mails could be stored in a compressed
          format this would be OK with me too.

MHonArc supports gzip output.

         It would be nice if the system could split the archive
         automatically by date (eg year/month, so they can be put
         on CD separately). Mails that were duplicated in different
         periods might need a link ?

Requires a pre-processor.  Can probably be done with Procmail.
Perl is an options also.

      d) Multi-system (Unix, Windows) access to the archive is a must. This
         is why I think MHonARC might be the right tool, as HTML can be
         read by both systems.

Could restrict choice of search engine.  I know Namazu has a win32 version,
but I do not know if the index files are compatible.  Have never checked.
A Java-based search engine is an option, but I do not know if any
are available.

      e) The archive shold be extensible, so that I can pipe new mails into

MHonArc was designed to add messages to an existing archive.

Is there maybe a better tool than MHonARC to do this?

As is usually the case, it will probably be a combination of tools.

--ewh

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Is MHonARC for me?, Earl Hood <=