procmail
[Top] [All Lists]

Splitting gigantic mboxes (was Re: splitting up mbox with formail?)

1998-08-27 07:01:24
1998-08-27-01:18:55 Christopher Lindsey:
To me, a large mailbox would consists of about 10,000 messages per month
(that's about what I get). That would mean that my mailbox would contain
60,000 messages in 6 months. I sure as heck wouldn't want to skim through it
all or even try to load it up in an MUA.

Hear, hear!

I also deal with monster volumes of email.

I've switched over entirely to Maildir in all my email handling; the only
place I still see mboxes is in the save folders of my netnews reading (using
slrn) and whenever I want to process them I either convert them into Maildir
(e.g. for archival) or simply split them into multiple messages.

Splitting into multiple messages turns out to be preposterously easy; using
GNU csplit:

        csplit -n4 - '/^From /' '{*}'

That will create an empty xx0000 which I delete, and leave the messages in
files named xx0001, xx0002, etc. If you have more than 9999 messages in a
folder then go -n6, or -n9, or whatever.

Once they're split it's really easy to use shell tools to bundle messages into
batches, file them into categories, etc.

If you are archiving all email traffic forever (which I do) then another dandy
tool to add to the mix is glimpse <URL:http://glimpse.cs.arizona.edu/>; it
takes a while to build the index, but that's a fine job to run out of cron at
night. Once the index is built it's a pleasingly quick way to root through big
archives of messages.

-Bennett

<Prev in Thread] Current Thread [Next in Thread>