procmail
[Top] [All Lists]

splitting up huge mailbox (was mail assistance please)

2001-12-26 22:17:47
Gary Funck wrote (in reply to an old post from Michael Weiner),

| 1) split the incoming messages into folders which contain at most MSGMAX
| messages.

| If we take the integer division result of dividing FILENO by MSGMAX, we can
| use that as handle to create a filename, which in turn is a mailbox where
| the messages will be collected. We would invoke formail as follows:
| 
|    (FILENO=0; export FILENO; MSGMAX=500; export MSGMAX; \
|     rm -f ./mbox.????; \
|     formail -des sh -c 'n=`expr $FILENO / $MSGMAX`; m=`printf mbox.%04d $n`;
| cat - >> $m' < mailbox)
| 
| where (for example) messages 1..500 are deposited in the file named
| 'mbox.0000', messages 501..1000 are deposited in the file named 'mbox.0001'
| and so on.

In fact, procmail can do the allotment instead of sh, expr, printf, and cat.

 FILENO=0 MSGMAX=500 formail -des procmail -p .allotrc < mailbox

where ~/.allotrc is as follows:

 :0: # procmailsc rounds fractions 0<x<1 up instead of down; override
 * $ $MSGMAX^0
 * $ -$FILENO^0
 mbox.0000 # or mbox.0001 to start from mbox.0001

 # Otherwise, take integer quotient, for which we need this:
 onechar=.

 :0
 * $ onechar ?? $FILENO^1 < $MSGMAX
# if you are starting from mbox.0001, uncomment the next line:
#* 1^0
 { }
 suffix = $= # will equal integer part of quotient (or of quotient+1)

 # accept if at least four digits:
 :0:
 * suffix ?? .... 
 mbox.$suffix

 # Otherwise, pad to four digits:
 :0E:
 * suffix ?? ...
 mbox.0$suffix
 :0E:
 * suffix ?? ..
 mbox.00$suffix
 :0E:
 mbox.000$suffix

Beyond that, I will point out that the -e switch slows formail -ds down
considerably.  If the input has a blank line between each two component
messages, you don't need -e and shouldn't use it.

If you can be sure that nobody else is running this in the same directory at
the same time and you don't use formail's -n option, you can probably speed
it up further by dropping all the local lockfiles.  Then again, if you do use
the local lockfiles, you can speed it up by using formail's -n option: just
be sure you use (a) both, (b) neither, or (c) the local lockfiles without -n;
don't use -n without the local lockfiles!

The special case where MSGMAX=1 -- that is, you want one message per file --
can be done with no rcfile at all:

FILENO=0000 formail -des procmail -p DEFAULT='mbox.$FILENO' /dev/null < mailbox

or start with FILENO=0001 if you prefer.  Note the strong quotes around
'mbox.$FILENO'.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail