Re: [approved] Function mapping messages to IDs?

2003-09-09 22:06:44
On September 9, 2003 at 19:47, Alejandro Forero Cuervo wrote:

If the input  is from a MH-style folder,  mhonarc will actually
do a  numeric sort on  the directory contents first  to provide
some  parallel  to  filename numbers.   However,  if  MHPATTERN
resource is customized,  the order the files  are processed are
somewhat arbitrary.

I'm using

$ mhonarc -rcfile $file -mhpattern '^[^\.]' $maildir/{cur,new}

to generate the archive and the same command with -add to keep it
up to date.

Would it be  too hard to have MHonArc sort  the messages by their
date  (rather than  by their  filename) and  use that  order when
assigning them IDs?

It would not be hard, but it adds extra overhead since each file
would have to be stat'ed.

BTW, MHonArc is  designed where the message  number really does
not  mean  much.   Why  is  the  message  numbering  assignment
important to you?

Because the URL given to each  message depends on the number.  If
the order is  somewhat arbitrary one could run  into trouble when
regenerating the archive from different sources as the URLs would
change and links from other locations would break.

I think it would be best to use some criteria that allows MHonArc
to  always  give  the  same  file  names  to  the  same  messages
regardless of whether  the archive is generated from a  MBox or a

This issue has been brought up before and it is a known problem.
The date sorting method you advocate does not solve the real problem.
Numbering will be modified if you happen to remove at least one of
the original raw messages if doing an archive rebuild.

It is worth noting that your case is somewhat unusual since you changed
storage formats for the raw data, causing them to be processed in a
different order.

This  criteria  should also  give  the  messages the  same  names
regardless  of whether  the archive  has been  constantly updated
using the ``-add'' argument or is  rebuilt from the ground up and
the only criteria  I can think of to make  this possible is using
the date (actually the date of arrival to the archive, hmm).

No, the best method would to use the message-id since message-ids
are unique while you can have multiple messages with the same date
(which can provide a source of numbering inconsistency depending
on how the sorting algorithm works).

Unfortunately, such a change at this time brings up compatibility
issues.  It may be possible to make it a configurable option.
Using message-ids for filenames have been on the TODO list for

Note, the annotation feature of mhonarc (does anyone even use
it?) actually uses message-id filenames so annotations can be preserved
on rebuilds or shared across multiple archives.

I recently  had to  rebuild my  HTML archive  (as I  changed some
options and also as I converted  my archive from Mbox to Maildir)
and I noticed that the URLs had changed.

Later version of MHonArc have the RECONVERT resource.  Although
less efficient than doing a virgin rebuild, it allows you to
effectively rebuild an archive but preserve message numbers.

The mharc system, <>, deals with
the problem by providing a "Permanent Link" to bookmarking purposes.
It utilizes the underlying search engine to provide persistent
links to messages.  You can see an example of this in the
mail archives.


