procmail
[Top] [All Lists]

Re: pruning mbox's based on number of messages?

1999-12-12 07:42:34
On Sat, 11 Dec 1999 11:46:03 -0800, gary(_at_)Intrepid(_dot_)Com (Gary Funck)
wrote:
"messages" is a shell script, which ultimately does the
following:

    mcount=`egrep -c "^From " $fname`

which uses grep on '^From ' to get the message count. I think this
isn't exactly right from a MIME-compliance point of view. Maybe
someone will help to clarify this point.

MIME doesn't really say anything about multiple messages in a folder.
As long as you're using Berkeley mbox folders, any line which starts
with the letters "From " is a mailbox separator. (Some mbox programs
are stricter than this, and require e.g. the line before the From to
be empty, or the time stamp on the From_ line to be in a particular
format. This is mainly because there is very litte real documentation
so everybody just ended up speculating :-)

There is a very similar format which keeps the From_ line but actually
uses Content-Length: (`banner UGH`) to keep track of where the current
message ends and the next one begins. Fortunately, this never became
very widespread. (Sun's mailtool I think used this, any others?) In
folders in this format, a From_ line is not necessary the beginning of
a new message.

Ghod help you if you have folders of both types (or messages of both
types in the same folder :-)

FWIW, many Quoted-Printable encoders will change any leading From to
=46rom (or even any leading F to =46, or any From to =46rom, or any F
to =46 -- true paranoia knows no limits!) in order to distinguish
these from real From_ lines (helpfully, regardless of whether any
system in their neck of the woods may ever had heard of Berkeley mbox
format, of course). In the pre-MIME world, you would frequently see
these "fake" Froms changed into >Froms somewhere along the way.

/* era */

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition