MHonArc just processed its one millionth email on my computer.
As you can imagine, I'm extremely pleased. What great software!
It's been a lot fun scaling up. Here's what I learned from my
experience over the last year and a half, from a technical standpoint.
The system is a single PC with an AMD K6-II processor, 256 megs of
ram, and two 16 gig IBM IDE disks.
MHonArc:
* MHonArc, in batch mode, provides O(n) performance no
matter how big an archive gets (tested to 60k)
* The risk of an orphaned lock file is too great. It was better to
use -nolock and manage concurrency myself.
* It was better to buy RAM than to use -savemem
* Once in a blue moon, perl processes can crash and core dump.
No big deal if you remember to check process return values.
* htdig makes for an excellent search engine for MHonArc pages
Stock redhat linux 5.2:
* The default open files limit (1000) is too low.
* Mounting a hard disk takes minutes, e2fsck can take an hour, and
ls can take quite a few seconds.
* IDE disk thoughput increased when I tweaked settings with hdparm
* When you do a lot of writing to system logs, syslogd starts
hogging 25% of the processor. Rotating logfiles daily fixes
this problem.
* Better to put some limits on 'updated'
* People can break into a stock system (due to security holes
in software bundled with the OS)
Other:
* There are certain emails which can kill nmh.
-------------------
* My friend Brian Semmes was a math major. For some reason he
took the introductory electrical engineering class, and it wasn't
pretty. The professor said things like, "This resister has 10^6 ohms;
heck, 10^6 is practically inifinity, so we'll just substitute infinity
into this equation..." It drove him crazy, and I think of him
whenever I hear the word "million".