mhonarc-users

Re: persistently linking to within archives

1999-11-21 21:16:05
Sorry to bury you in arcana.  

Given some elapsed time, I have another idea I like better.

Look up the Storage Resource Broker from San Diego Supercomputer Center
<http://www.sdsc.edu/DICE/>.  If you don't just use this as your retrieval
engine, you should study it enough to steal their logic.  They have the
same problem that the file path to resources may change, and they want to
offer a persistent handle to the resources anyway.  They use attribute
patterns as the persistent ID.  That is the sum an substance of what I know
about it except that this is a robust solution to the problem you are
wrestling with.

Al

[however, some details follow...]

At 05:59 PM 11/21/99 -0500, Nathaniel Irons wrote:
On 11/21/99 at 2:59 PM, asgilman(_at_)iamdigex(_dot_)net (Al Gilman) wrote:

The point is to continue the pattern where Resent-ID: the header for a
mailing list manager which sends forward a message lacking a
Message-ID: from the originating node.


Could you rephrase this?  I don't understand it.

When MHonArc receives a message, if it does not already have a Message-ID
header or a Resent-ID header or any other header which it trusts to provide
a unique ID, MHonArc would assign the message an Archive-ID header or else
call it an X-Archive-ID header.  The persistent URL scheme would use the ID
from any of these headers, the first one from the "search list" that was
present in the message, as the retrieval key for the persistent URL.  All
received messages would be guaranteed to have a Unique ID in the class that
the retrieval code checks by virtue of the willingness to add a header to
the message if it had no other.


I have no prior familiarity with the MS-tnef correlator attribute, or
with <mid> URLs.  I'll continue to spelunk through the W3C site, but
it's about as maddening as it's ever been if one doesn't already know
what one wants and where to find it -- I'd greatly appreciate some
pointers on what exactly it is that you're referring to.


Yes, you won't find the definitions of URLs by poking around www.w3.org.
The defining documents for these things are in the IETF documents which are
buried in even harder to find places.

What I do for IETF documents is slow, but sure.  This is to find the RFC
which contains the current definition of STD 1.  That is an index to all
currently recognized standards-track RFCs.  Then you read titles and browse
the text of various RFCs until you find the right one.

This gives the processing software a search-list of Message-ID,
Resent-ID, Archived-ID with which to satisfy the need for a Unique-ID.
 Compare with the abortive introduction of the MS-tnef correlator
attribute.

I'm not clear on your usage of 'search-list', so I don't understand how
it impacts the need for a unique ID.  I haven't discovered, at this
point, why I'd ever want to get into the business of verifying
uniqueness (or assigning supplementary unique IDs) beyond the
Message-ID, with the MD5 replacement as a backup.  The prospect seems to
fall somewhere between daunting and horrifying. 

I wasn't proposing that you validate the uniqueness of the Message-ID.  I
was proposing that you check for a Resent-ID if there is not Message-ID
because it is supposed to satify the same uniqueness property as a
Message-ID.  I was also assuming that you would have access to some sort of
an ID server that would get you an ID that was not the same as any
Message-ID assigned within your domain.

If you are going to mix Message-ID's with MD5 hashes in the same syntax,
how would you be assured of non-conflicting values?  Or would you be using
syntactically different URLs for the messages that have Message-ID and
those that don't?  That thought was what sent me off thinking about the SRB.

If you are going to put up a retrieval CGI then it makes sense to make the
URL of the form


http://path.to.your.node/cgi-bin/get-from-archive?parm1=val1&parm2=...

And then either message-id or mid would be the parameter name in the
searchpart for the message id and you could use a long name or 'hash' for
the parameter name of the digest MD5 and be done.

I have run across one case where the message-ID is not sufficient to
determine the message's uniqueness -- when we start talking about
spanning multiple MHonArc archives, we run across the possibility that
the same message will be legitimately sent to multiple lists.  

Yes, then you are looking to find not a message but a location in the
thread structure.  In the above multi-parameter keying scheme you could
simply add archive=thread to the &mid=foo(_at_)bar to make the URL return what
you want.


This is not much of a factor in MHonArc's multi-archive annotation
(because the note is likely to be relevant to all instances of a
message), but it is a serious flaw in a monolithic shared repository. 
If someone's looking for the message beginning a specific thread in
Widgetlist A, it won't do for them to get instead the same announcement
as it appeared in Widgetlist B.

It's not much of a factor in MHonArc's multi-archive indexing because the
latter doesn't exist yet.

So, while I admit that the message-ID is insufficient in at least one
common case, I'm inclined to look for the simplest possible resolution,
and I don't understand your suggestions well enough to evaluate them.

Do give the SRB a look.

thanks for your time.

Thanks for your time if you do anything with this, whether following my
suggestions or otherwise.

Al

 -nat