On 2 Sep 98 at 21:53, Christopher Lindsey wrote:
So, one potential feature for the future might be an option to use
reproducible filenames for messges. Like naming the file after the MD5
checksum of a message, or the message ID, or something else that is
statistically likely to be unique.
I agree with you 100%, and in fact there was a thread about this very
same topic about a month ago on this list. Because there are so many
broken (read that as non-RFC compliant) MUAs out there, it's difficult
to guarantee that unique Message-Ids will be available in each message.
For this reason I prefer MD5 sums of the message body -- there is
statistically only a 1:18446744073709551616 chance of matching a
false positive. For anyone interested, I've written some sendmail
8.9.1 patches to add md5sums at the sendmail level (based on the work of
Martin Hamilton) and also have a procmail recipe to do the same.
This sounds like a useful issue to tackle. I've been hit by it a few
times, when from other parts of my site, or in in other messages to
the list, I have referred to articles by filename ... only to find
that later, when I have rebuilt the archives to roll out new .rc
files, the filename has changed :(
However, I wonder about the MD5 method. Without knowing anything
about MD5, could it work with 8.3 filenames? I buid my archives on a
DOS box, so am constrained to that format.
I am also not concrned about the lack of message-IDs: this problem
becomes visible quite quickly in my setup, as articles are repeatedly
added to the database on each archive build. When I spot this, I
just edit the mbox file and add a message-id of the form
poster's_name_YYMMDDHHMMSS_something_random(_at_)no-valid-msg-id
(I know this is a prob for others, and I recognise the difficulty --
I'm just saying its not a prob for me, though I hope it would be
supported for the benefit of others, esp those with more heavily
automated systems).
So it occurred to me that one way of implementing this would be to
create a new .db file (e.g. filename.db), which would record the
filenames used for each message ID and for each MD5 sum. That way
the chances of a duplicate occurring are *very* low: it would require
a duplicate MD5 sum *and* a duplicate or missding msg-id.
AFAICS, mhonarc.db is wiped when the archives are rebuilt ... and all
that would be needed is to ensure that filename.db is not wiped on a
rebuild, and its data reused. That way, we could retain the current
flexibility of filename format (which has other advantages, such as
being reasonably transparent) and add permanency.
How does that sound?
Best wishes,
Claire
--
Claire McNab -- Claire(_at_)siberia(_dot_)demon(_dot_)co(_dot_)uk