procmail
[Top] [All Lists]

Re: pruning mbox's based on number of messages?

1999-12-12 12:56:20
Liviu Daia <Liviu(_dot_)Daia(_at_)imar(_dot_)ro> writes:
On 12 December 1999, era eriksson <era(_at_)iki(_dot_)fi> wrote:
...
MIME doesn't really say anything about multiple messages in a folder.
As long as you're using Berkeley mbox folders, any line which starts
with the letters "From " is a mailbox separator.

   There's always our beloved "message/rfc822".  Since RFC 2046 didn't
bother to state whether "From_" lines are legal or not at the beginning
of "message/rfc822" attachments, different mailers ended up doing
different things: some of them (f.i. older versions of Netscape) will
keep "From_" lines, while others (such as newer Netscapes) will delete
them.  In a correctly formed attachment, such a line will be preceded by
an empty one.  And since it obviously also has a valid "From_" syntax,
you can't rely on parsing its content either.  Like it or not, in such
cases "Content-Length:" is your only hint (if you insist on using mbox
mailboxes that is).

Yes and no.  rfc2046 states

5.2.1.  RFC822 Subtype

   A media type of "message/rfc822" indicates that the body contains an
   encapsulated message, with the syntax of an RFC 822 message.
   However, unlike top-level RFC 822 messages, the restriction that each
   "message/rfc822" body must include a "From", "Date", and at least one
   destination header is removed and replaced with the requirement that
   at least one of "From", "Subject", or "Date" must be present.

   It should be noted that, despite the use of the numbers "822", a
   "message/rfc822" entity isn't restricted to material in strict
   conformance to RFC822, nor are the semantics of "message/rfc822"
   objects restricted to the semantics defined in RFC822. More
   specifically, a "message/rfc822" message could well be a News article
   or a MIME message.

So, "From " lines are legal at the beginning of an message/rfc822
content.  Ideally they would be left off of mail messages and included
in News articles that are so encapsulated, as "From " is not a valid
header in rfc822 messages or MIME messages but it is a valid News
article header.  However, any program that tries to parse a content
labeled message/rfc822 better be ready to handle them.

Now, if such a message comes in there should be no problem with
splitting it: it's the responsibility of the local delivery agent to
'adjust' a message for a mailbox format and part of the formatting for
mbox style mailboxes is to escape all lines beginning with "From " and
generate a correct initial "From " line.  If the mailbox is in "mbox
with Content-Length: header" style, then formatting the message for
delivery means that that header should be adjusted to contain the
correct length and "From " lines are passed through unharmed.  Either
ways, it is the responsibility of whatever program that stores the
message to make sure the message can be correctly extracted from the
mailbox.

Notice that programs that read mbox-style mailboxes must be prepared to
correctly handle messages that have been munged by the storage
process.  This means that if a structured message format is being used
that allows "From " at the beginning of the line, the message parser
must also accept ">From " in those cases or it won't work with messages
stored unencoded in mbox-style mailboxes.  For example, I would expect
MIME parsers that parse message/rfc822 subtypes to accept both "From "
and ">From " lines at the beginning of that content.


Philip Guenther