procmail
[Top] [All Lists]

Re: mbox anomaly

1999-12-21 23:20:22
"Jeremy M. Dolan" <jeremy(_at_)axistangent(_dot_)net> writes:
...
While setting up procmail, I ran into trouble feeding it large
mbox's that were written by mutt. I first suspected it was a problem
with procmail/mutt/pine's handling of From_ lines in MIME boundaries.
the included file is a smaller test mbox, without any multipart
messages. However the attached mbox is quite isolated and has no MIME
multiparts.
..
Reading this file produces strange results:

grep "^From " file - shows 4 messages
mutt -f file       - shows the first message, contains the whole file
pine -f file       - shows all 4 messages
procmail < file    - sees the whole file as 1 message, like mutt
hexdump -c file    - shows it as normal mbox format, \nFrom, no \r's
...

Okay, the _real_ problem is that the message you're receiving contain
bogus Content-Length: header fields.  One of the variations on mbox
format says that the number of bytes specified by the Content-Length:
header field should be taken verbatim and not scanned for From_ message
separators.  At sites where this variation is used, the Content-Length
field should be set to the correct value by the Local Delivery Agent.
procmail will do so, unless it is invoked with teh -Y flag, which tells
it to ignore the Content-Length: field and always escape embedded From_
lines.  The vast majority of sites do so (check your .forward file or
the sendmail.cf, or wherever procmail is invoked), so that CL: fields
in incoming messages will _not_ be updated to reflect the actual message
size -- you told procmail to ignore them, and boy, does it ever.

So, let's consider the first message in the mailbox.  It's body is ~1450
bytes long, while its CL: field contains the value 11267.  Mutt comes
along, decides to pay attention to CL: field, and sees the body of the
first message as encompassing the next three message.  At some point
in the third it resumes scanning for a From_ line and correct splits of
the following message.  Pine, on the otherhand, ignores the CL: field,
splits on the From_ lines, and when it rewrites the mailbox it drops
the CL: field, thus letting mutt correctly split the message later on.

So, you need to either a) tell mutt to always ignore CL: fields and only
split on From_ lines, or b) filter out CL: fields in your .procmailrc.
I don't know how to do (a), but (b) is as simple as:

        :0 fhw
        * ^Content-Length:
        | formail -I Content-Length:


You may wonder at this point why procmail considered the file as a single
message when you executed "procmail < file".  Well, procmail _always_
considers its input to be a single message.  If you want to split a mbox
format mailbox into multiple messages you need to use formail -s flag to
invoke procmail once for each message.  Note that like procmail, formail
will by default pay attention to CL: fields, so you should include the
-Y flag on formail's command line when splitting mailboxes that may
contain bogus CL: fields:

        formail -Y -s procmail <file

Note that a quick way to strip the CL: field from every message in
a mailbox is to give formail the "delete this header" argument while
splitting:

        formail -Y -I Content-Length: -s <file  >file.new

(If you don't specify a program to invoke on each message when splitting,
formail will just send the message to its stdout.)

Does that all make sense?


Philip Guenther

<Prev in Thread] Current Thread [Next in Thread>
  • mbox anomaly, Jeremy M. Dolan
    • Re: mbox anomaly, Philip Guenther <=