procmail
[Top] [All Lists]

Re: pruning mbox's based on number of messages?

1999-12-12 14:36:58
On 12 December 1999, Philip Guenther <guenther(_at_)gac(_dot_)edu> wrote:
Liviu Daia <Liviu(_dot_)Daia(_at_)imar(_dot_)ro> writes:
On 12 December 1999, era eriksson <era(_at_)iki(_dot_)fi> wrote:
...
MIME doesn't really say anything about multiple messages in a
folder.  As long as you're using Berkeley mbox folders, any line
which starts with the letters "From " is a mailbox separator.

   There's always our beloved "message/rfc822".  Since RFC 2046
didn't bother to state whether "From_" lines are legal or not at
the beginning of "message/rfc822" attachments, different mailers
ended up doing different things: some of them (f.i. older versions
of Netscape) will keep "From_" lines, while others (such as newer
Netscapes) will delete them.  In a correctly formed attachment, such
a line will be preceded by an empty one.  And since it obviously also
has a valid "From_" syntax, you can't rely on parsing its content
either.  Like it or not, in such cases "Content-Length:" is your only
hint (if you insist on using mbox mailboxes that is).

Yes and no. rfc2046 states

5.2.1.  RFC822 Subtype

   A media type of "message/rfc822" indicates that the body contains
   an encapsulated message, with the syntax of an RFC 822 message.
   However, unlike top-level RFC 822 messages, the restriction that
   each "message/rfc822" body must include a "From", "Date", and at
   least one destination header is removed and replaced with the
   requirement that at least one of "From", "Subject", or "Date" must
   be present.

   It should be noted that, despite the use of the numbers "822", a
   "message/rfc822" entity isn't restricted to material in strict
   conformance to RFC822, nor are the semantics of "message/rfc822"
   objects restricted to the semantics defined in RFC822. More
   specifically, a "message/rfc822" message could well be a News
   article or a MIME message.

So, "From " lines are legal at the beginning of an message/rfc822
content.

    How do you deduce that?  People with better advocacy aptitudes
than myself and too much spare time on their hands seem to have agreed
long time ago that, in recent RFC parlance, "From_" lines are not to
be considered headers (and, to the best of my knowledge, the only RFC
that mentions "From_" in relation to mail messages is still RFC 976,
"UUCP Mail Interchange Format Standard").  So, my reading of the above
is that "From:", "Subject:", and "Date:" lines are allowed.  RFC 2046
doesn't explicitly _forbid_ "From_" lines --- but that's not the same as
explicitly stating that they are allowed.

Ideally they would be left off of mail messages and included in News
articles that are so encapsulated, as "From " is not a valid header
in rfc822 messages or MIME messages but it is a valid News article
header.  However, any program that tries to parse a content labeled
message/rfc822 better be ready to handle them.

    Agreed.

[...]
Notice that programs that read mbox-style mailboxes must be prepared
to correctly handle messages that have been munged by the storage
process.  This means that if a structured message format is being
used that allows "From " at the beginning of the line, the message
parser must also accept ">From " in those cases or it won't work with
messages stored unencoded in mbox-style mailboxes.  For example, I
would expect MIME parsers that parse message/rfc822 subtypes to accept
both "From " and ">From " lines at the beginning of that content.

    A MIME parser shouldn't have any problem splitting these messages
indeed.  But the initial topic was about counting messages in procmail.

    Regards,

    Liviu Daia

-- 
Dr. Liviu Daia               e-mail:   Liviu(_dot_)Daia(_at_)imar(_dot_)ro
Institute of Mathematics     web page: http://www.imar.ro/~daia
of the Romanian Academy      PGP key:  http://www.imar.ro/~daia/daia.asc