procmail
[Top] [All Lists]

pruning mbox's based on number of messages?

1999-12-11 11:41:56
I keep a number of mail box folders that track messages received from
various mailing lists.  However, the mail in each mbox can mount up,
and I'm interested in keepiing only, say, the last 100/so messages,
and archiving the rest, on a periodic basis.  A nightly cron job
would run that checks the number of messages in each mbox, and when
the total, N (for example 250), exceeds the threshold M (for example,
100), the script would copy the first N-M (eg, 150) messages to an
archived mailbox file, and leave only M messages in the current
contents of the mailbox.  Do you know of any procmail scripts
that already do something like that?

The problem decomposes into:

0. lock the mailbox
1. determine the number of messages currently in the mailbox, N
2. if N > M then
3.    extract the first N-M messages and copy to archive mailbox
4.    extract the last M messages and copy to a new version of the mailbox
5.    rename the new mailbox back to the original.
6. end if
7. unlock the mailbox

I was thinking that steps 3 and 4 might be performed by formail's
switches:

  -total  Output at most total messages while splitting.
  +skip   Skip the first skip messages while splitting.

But this still leaves the question: how many messages are in
the mailbox?  So far, I've thought of two approaches:

The first approach is rather slow - it splits all the messages
and runs them through a shell script whose only purpose is
to print the value of $FILENO.  The last value is extracted
with 'tail', and then 'sed' removes the leading zeros:
   set N = `setenv FILENO 000001; \\
            formail -s csh -c 'echo $FILENO' < mailbox | \\
            tail -1 | sed -e 's/^[0]*//'`
but this approach is slow, and rather complicated.

The second approach is simpler and much faster:
   set N = `grep -c '^From ' mailbox`
but may not be completely reliable because it doesn't decode MIME
attachments and such, which I suppose could contain lines that
begin with 'From '.

Do any other approaches come to mind?  Probably I'll go with the
second approach, and accept the inaccuracies, unless there's something
better.

Once N is known, will the following work?
   lockfile mailbox.lock
   set N = `grep -c '^From ' mailbox` # the number of messages
   set M = 100 # the threshold
   @ A = $N - $M   # the number of messages to archive
   if ($A > 0) then
        ## add a newline to the end of the archived mailbox
        ## if it already exists, to separate the new batch
        ## of messsages from the old.
        if ( -e archive/mailbox ) echo "" >> archive/mailbox

        ## Extract $A messages from the front of the mailbox
        ## and append them to the archived mailbox
        formail -$A < mailbox >> archive/mailbox

        ## create a new mailbox, skipping $A messages
        formail +$A < mailbox > mailbox.new

        ## rename new mailbox to old, first set last mod.
        ## date to agree with old mailbox.
        touch -r mailbox mailbox.new
        mv mailbox.new mailbox

   endif
   rm -f mailbox.lock