On Aug 24 2004, Charles Lindsey wrote:
- recommend base64 encoding as a default for transport of mbox files
over email, since the encoder typically has no idea about the message
contents and whether they might contain binary data.
No. In the common case where the 'blobs' are in fact RFC 2822 + MIME
messages, they will already contain their own CTEs. And if they
occasionally contain raw binary, then the sender (who knows where he got
them from) is in the best position to decide whether an overall base64 is
necessary.
What bothers me (and I accept this may be a purely theoretical
objection) is the potential for effectively merging and mixing of
disparate mbox files.
It is one thing to send an mbox file to somebody, who saves it to disk
as a separate file, which can be opened/read/written, or
converted/imported to another format etc. This sort of thing has been
done in the past, and works quite well by using
application/octet-stream.
However, unlike MIME, mbox is not a recursive format. It is a linear
list of "blobs'. Thus if you send an mbox attachment without encoding
the delimiters within the attachment (ie the "From_" lines etc.), then
the common practice of appending such a message to the mail spool file
can automatically destroy the original structure.
Instead of seeing a single message which contains an mbox attachment,
the appended spool is now seen to contain several messages, first the
actual message received, followed by the messages belonging to the
mbox attachment, as if they had been received individually.
[message 1 | message 2] + (message3 + attachment[message 4 | message 5])
=
[message 1 | message 2 | message 3 | message 4 | message 5]
Coping with such a case means making mbox parsers much smarter than
they currently are, or else making mbox writers smarter (so that the spool
is unambiguously appended).
Either way, this is a new difficulty I think, specifically introduced
by unencoded mbox file attachments (because the "From_" delimiters of
the attachment bleed through to the outer mbox containing the received
message). It can be prevented simply by making sure that "From_"
lines in the attachment are unrecognizable (encoded).
Another consequence of this difficulty is that you can have mbox files
whose blob delimiters suddenly change partway through, when the
conventions used by an unencoded, attached, mbox file take over
temporarily. This can cause more trouble for existing mbox parsers.
--
Laird Breyer.