Re: msgid instead of seq. number for output files

1998-07-30 22:52:46
On July 30, 1998 at 08:09, Christopher Lindsey wrote:

Ah, before I get asked: If no message-id is given mhonarc should/will
create it's own id on the fly as it does now.

If it does this (and it should do it now for the duplicate message checking),

MHonArc adding IDs will not help duplicate message checking.  It helps in
other ways.

checks should be made for RFC-compliant Message-Id: headers.  A lot of 
messages that I get from misconfigured relays don't send unique Message-Ids,
therefore breaking the duplicate message checking.

Yes, that is a problem.

Of course, not everyone would like to use md5sums for this.  It could
really slow things down if you were adding 10000 messages to the archive
and needed to calculate a sum for each one.  So what about the possibility
of choosing which header you want to use for duplicate checking?  Is
that easy to create a resource for?  Is it extensible to Achim's 

Something like FROMFIELDS can be done.  However, I will need more
information on the requirements that are needed to make it
effective.  For example, is a simple string compare sufficient,
or is something more elaborate needed.  Can I take the two md5sums
and just do a string compare to determine uniqueness?  Or are
additional computations needed depending on the fields that are
being evaluated.

BTW, I prefere not have MHonArc do anything like computing md5sums.
You sited the main reason: performance.  If md5sums are needed, the
user should do it via the MTA or some other program where it can be
done more efficiently.  It also promotes a division of labor.


             Earl Hood              | University of California: Irvine
      ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu      |      Electronic 
Loiterer | Dabbler of SGML/WWW/Perl/MIME