RE: reproducible URLs

1998-09-10 08:30:30
Your message was fun, Jeff.

I reasoned this out similarly, thinking along the lines of base-64 used in
MIME.  The permissible character set for DOS-platform file names contains at
least 46 characters.  The number of different names expressible in 8 base-46
characters is sufficient to have a minuscule collision probability for
archives of any reasonable size.  A 100,000 message archive seems two orders
of magnitude too high for MHonArc's basic design; anything that large using
a filesystem as its database needs to be organized hierarchically.  That
would add a subdirectory namespace into the quota.

-- SP

Anyway, sorry I didn't jump in then, but the kind-of-fun question was
implicitly raised: how many bits of randomness do you need for
reproducible URLs in MHonArc?  (Hey, it's not every day that real life
questions can be tackled like problem sets!)

Now if we are restricted to ending the filenames with something like
.htm, then there are only about 41 bits of randomness, and then we
run about 1% risk of collision for a puny n=100,000 message archive.
That's pushing it.

Ok, one last note. If we use a real filesystem, with upper and lower
case letters in the filenames, we'd still need 10 characters in the
filename to meet/exceed the acceptable saftey margin (57 bits). So
those lower case letters don't help us much in the region we are
interested in.

Using MD-5 checksums for filenames is complete overkill statisticly
speaking. They are 128 bits, and would consume 20-odd characters in
the filename. 10 character filenames would do the trick nicely. There
is certainly no need to combine MD-5 and message-ID's from a
statistical standpoint.

<Prev in Thread] Current Thread [Next in Thread>