nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] indexing

2011-02-05 21:32:36
From: Valdis(_dot_)Kletnieks(_at_)vt(_dot_)edu
Date: Sat, 05 Feb 2011 20:22:35 -0500

I've seen this idea several times, and I always have the same question
- how would we deal with index/cache synchronization?  One of the
reasons I'm still using MH/exmh is because the one message per file
paradigm means that you can do interesting things with regular Unix
commands - except if you screw up and use /bin/mv or /bin/rm rather
than refile and rmm, you end up with the index no longer matching
reality.

right now i have an index that can be kept up to date with existing hooks
and some small outboard programs, and i have a couple of scripts that can
rebuild the index for a single folder or for all folders.  i have not yet
built the incremental rebuild/checker that only cleans up after "rm" and
"mv" but i agree that one is needed and it must be Really Fast for folders
that are only slightly out of synch.

anything that looks at the index will have to check for index freshness
and be ready to call the incremental rebuild/checker as nec'y.

This may be easier to deal with on recent Linux kernels, where you can use
stuff like the inotify facility and leave a process running to catch such
activity and clean up the cache.  But that's hell on portability....

not only hell on portability also a lot of moving parts and likely to be
unreliable.  and, "not the MH way."

---

Date: Sat, 5 Feb 2011 18:04:14 -0800 (PST)
From: Lyndon Nerenberg <lyndon(_at_)orthanc(_dot_)ca>

If you're willing to live with "almost 100% accurate" you can go a
long way just by comparing the index/cache file mod times with the
directory mod time.

i think we can get all the way to 100% accuracy by looking at the ctime
of the directory and the mtime of the index and doing incremental fixage
before accessing the index.  this should be pretty rare.

If you consider the messages to be immutable (which they are, with the
exception of anno mucking about adding headers) the only thing that's
really going to put you out of sync is if something renumbers the
files in the folder.

on the topic of 'anno', the IMAP protocol thinks that headers are immutable,
so much so that if they are changed then a new UID must be assigned.  i
think this means that a correct IMAP server must elide the 'anno' headers
but i havn't got that far yet.

And since the only way that's likely to happen is with pack, the
index/cache would get updated with the new file names.

well, also sortm, but your point is valid.

And as with the existing sequences imlementation, the only way you get
100% consistency is by making the message store a black box, at which
point it's no longer MH.

i surely do like MH.  one day in... 1986? i was visiting jordan hubbard
when he lived in the oakland hills and his girlfriend at that time (kim
manton) noted that i was using ucbmail as my primary mailer and she
said, and i'll never forget this, "why aren't you using MH? everybody
who's anybody uses MH."  i'd never heard of MH, but i tried it and never
stopped.  most people i knew who used MH have stopped, and are now happy
gmail web clients or outlook or Mail.app IMAP clients, but i can't do it,
i need 'pick' and 'refile'.

i just need MH to be faster for 10GByte mail stores than it is, and i
need it to be reliable through an IMAP interface.  i see no reason why
i should ever delete e-mail, 10GByte is small by today's standards.  but
i know i won't be able to keep open()'ing every file to see what's going
on unless i'm willing to run two MH stores and put everything older than
five years into the one i never look at.  which "would no longer be MH."

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
http://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>