On Thu, 12 Aug 2004 17:18:19 EDT, Tony Hansen said:
The information about the mbox format being anecdotally defined is
incorrect. The mbox format has traditionally been documented in the
binmail(1) or mail.local(8) man pages (BSD UNIX derivatives) or mail(1)
man page (UNIX System 3/5/III/V derivatives). There have been several
variants of the mbox format in use by those different systems. The most
complete description of an mbox format can be seen in the man page from
any UNIX System Vr4 derived system, such as Solaris.
Umm.. Tony? I hate to say it, but if there have been several variants used in
the wild, and the man pages for said variants document different formats,
that's awfully close to "anecdotally defined" when you're doing a standard.
For example, a Solaris 8 box across the hall says in 'man mail.local':
Each delivered mail message in the mailbox is preceded by a
"Unix From line" with the following format:
From sender_address time_stamp
The sender_address is extracted from the SMTP envelope
address (the envelope address is specified with the -f
option).
A trailing blank line is also added to the end of each mes-
sage.
Hmm. Nothing about whether the sender_address is, or should be, <bracketed>.
Nothing about the format of the time_stamp. Nothing about '>From ' stuffing
(and yes, I've seen systems that don't do it at all, and systems that only
-stuff if the From line matched a regexp for what *they* think the entire
'From '
line looks like(*)). The Sendmail 8.13.1 mail.local does say >-stuffing
happens for lines that "which could be mistaken for a ``From '' delimiter
line", and the code actually checks for exactly 5 chars...
Any doubts that this whole mess is at best anecdotally defined can be dispelled
by
mentioning "Content-Length:" (interestingly enough, not even mentioned in the
Solaris or Sendmail man pages, although the Sendmail source tree does mention
that building on Solaris 2.3 or later will turn it on. Of interest mostly
because
the Content-Length: is so easily broken by later >-stuffing/unstuffing or other
similar conversion...
(*) time_stamp. Argh. Fought with this during a data/machine migration.
Write code that will accept a 26 byte ctime format: 'Fri Sep 13 00:00:00
1986\n\0'.
Works fine once you realize that some systems just used 'From envelop_address'
without a timestamp.
Then I get handed this: 'Fri Aug 13 20:21:32 EDT 2004'. Fix that, and find some
joker running in a French locale: 'vendredi, 13 août 2004, 20:22:01 EDT'.
And yes, his b0rked software only >-stuffed 'From ' lines that regexp-matched
the *French* variant. Took me *quite* some time to twig into THAT one...
pgpUlnHkKmh20.pgp
Description: PGP signature