ietf-822
[Top] [All Lists]

Re: Last Call: 'The APPLICATION/MBOX Media-Type' to Proposed Standard

2004-08-13 17:40:08
On Thu, 12 Aug 2004 17:18:19 EDT, Tony Hansen said:
The information about the mbox format being anecdotally defined is 
incorrect. The mbox format has traditionally been documented in the 
binmail(1) or mail.local(8) man pages (BSD UNIX derivatives) or mail(1) 
man page (UNIX System 3/5/III/V derivatives). There have been several 
variants of the mbox format in use by those different systems. The most 
complete description of an mbox format can be seen in the man page from 
any UNIX System Vr4 derived system, such as Solaris.

Umm.. Tony?  I hate to say it, but if there have been several variants used in
the wild, and the man pages for said variants document different formats,
that's awfully close to "anecdotally defined" when you're doing a standard.

For example, a Solaris 8 box across the hall says in 'man mail.local':

     Each delivered mail message in the mailbox is preceded by  a
     "Unix From line" with the following format:

          From sender_address time_stamp

     The sender_address  is  extracted  from  the  SMTP  envelope
     address  (the  envelope  address  is  specified  with the -f
     option).

     A trailing blank line is also added to the end of each  mes-
     sage.

Hmm. Nothing about whether the sender_address is, or should be, <bracketed>.
Nothing about the format of the time_stamp. Nothing about '>From ' stuffing
(and yes, I've seen systems that don't do it at all, and systems that only
-stuff if the From line matched a regexp for what *they* think the entire 
'From '
line looks like(*)). The Sendmail 8.13.1 mail.local does say >-stuffing
happens for lines that "which could be mistaken for a ``From '' delimiter
line", and the code actually checks for exactly 5 chars...

Any doubts that this whole mess is at best anecdotally defined can be dispelled 
by
mentioning "Content-Length:" (interestingly enough, not even mentioned in the
Solaris or Sendmail man pages, although the Sendmail source tree does mention
that building on Solaris 2.3 or later will turn it on.  Of interest mostly 
because
the Content-Length: is so easily broken by later >-stuffing/unstuffing or other
similar conversion...

(*) time_stamp. Argh.  Fought with this during a data/machine migration.
Write code that will accept a 26 byte ctime format: 'Fri Sep 13 00:00:00 
1986\n\0'.
Works fine once you realize that some systems just used 'From envelop_address'
without a timestamp.

Then I get handed this: 'Fri Aug 13 20:21:32 EDT 2004'.  Fix that, and find some
joker running in a French locale: 'vendredi, 13 août 2004, 20:22:01 EDT'.
And yes, his b0rked software only >-stuffed 'From ' lines that regexp-matched
the *French* variant. Took me *quite* some time to twig into THAT one...

Attachment: pgpUlnHkKmh20.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>