procmail
[Top] [All Lists]

Re: splitting a large mailbox

2001-04-17 08:10:22
Richard asked,

| I have many large mailboxes (a few thousand messages in each) and I'd like to
| be able to split them quickly (one-message-per-file style) for subsequent
| searching/processing. I thought to use formail but it turns out that
| splitting with formail anf feeding to procmail to do the file writing takes a
| *very* long time.

You didn't say exactly how you're invoking formail and procmail, so perhaps
they can be sped up, but maybe you could use csplit instead?

What seems to slow things down the most is procmail's locking attempts.
This code:

#!/bin/sh
export mailbox
for mailbox in pattern
 do FILENO=00001 formail -ns sh -c 'cat > $mailbox.$FILENO' < "$mailbox"
done

despite the invocations of sh and cat, ran much faster than

#!/bin/sh
export mailbox
for mailbox in pattern
 do FILENO=00001 formail -ns \
  procmail -pm DEFAULT=$PWD/$mailbox.'$FILENO' /dev/null < "$mailbox"
done

(and I'm still not sure why I needed to give a full path for $DEFAULT instead
of assuming $PWD as the start when procmail had the -m option).

But the fastest thing I tried was to use procmail but prevent the locking;
where .splitrc had this code,

 :0
 $mailbox.$FILENO

this ran like the wind in comparison:

#!/bin/sh
export mailbox
for mailbox in pattern
 do FILENO=00001 formail -ns procmail -pm ./.splitrc < "$mailbox"
done

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>