On September 3, 1998 at 13:10, Claire McNab wrote:
For this reason I prefer MD5 sums of the message body -- there is
statistically only a 1:18446744073709551616 chance of matching a
false positive. For anyone interested, I've written some sendmail
8.9.1 patches to add md5sums at the sendmail level (based on the work of
Martin Hamilton) and also have a procmail recipe to do the same.
This sounds like a useful issue to tackle. I've been hit by it a few
times, when from other parts of my site, or in in other messages to
the list, I have referred to articles by filename ... only to find
that later, when I have rebuilt the archives to roll out new .rc
files, the filename has changed :(
However, I wonder about the MD5 method. Without knowing anything
about MD5, could it work with 8.3 filenames? I buid my archives on a
DOS box, so am constrained to that format.
Such a method will not work under 8.3 filenames. The current method
is friendly to 8.3 systems.
I am also not concrned about the lack of message-IDs: this problem
becomes visible quite quickly in my setup, as articles are repeatedly
added to the database on each archive build. When I spot this, I
just edit the mbox file and add a message-id of the form
(I know this is a prob for others, and I recognise the difficulty --
I'm just saying its not a prob for me, though I hope it would be
supported for the benefit of others, esp those with more heavily
v2.3 will create a message-id for messages w/o one. The id has
the string "NO-ID-FOUND" in it so one can tell the id was generated
So it occurred to me that one way of implementing this would be to
create a new .db file (e.g. filename.db), which would record the
filenames used for each message ID and for each MD5 sum. That way
the chances of a duplicate occurring are *very* low: it would require
a duplicate MD5 sum *and* a duplicate or missding msg-id.
AFAICS, mhonarc.db is wiped when the archives are rebuilt ... and all
that would be needed is to ensure that filename.db is not wiped on a
rebuild, and its data reused. That way, we could retain the current
flexibility of filename format (which has other advantages, such as
being reasonably transparent) and add permanency.
How does that sound?
Changing the v2.x code base to support different filenames from the
current convention will take some work. Also, if such a feature
were to be added to v2.x, the current filename style should still be
supported. I.e. Alternate schemes would be triggered by a resource.
Using messsage-ids (or MD5 sums) is something I will look into
for v2.x, but after v2.3 is released.
Earl Hood | University of California: Irvine
ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu | Electronic
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME