ietf-822
[Top] [All Lists]

Re: Last Call: 'The APPLICATION/MBOX Media-Type' to Proposed Standard

2004-08-16 18:01:34

On Aug 16 2004, Bruce Lilly wrote:

Specifically, From stuffing and any special treatment of control
characters can cause problems with delimiters.  In the specific
case of a Content-Length delimiter, *any* change to the content
which changes its length (e.g. adding or removing trailing
whitespace on lines) will cause problems.

Agreed.

Note that "From" is valid base64 output and that transport can
result in addition of trailing whitespace.

Ah, you're right! A single 3 byte attachment could fake a minimal
From_ delimiter (to get an encoding of the form empty line + "From" +
space padding + empty line). Any bigger attachment wouldn't, if the
Base64 is properly formatted. So any nonempty (ie containing 
at least one message header) mbox attachment would still be safe. 

This could make proper parsing of, e.g. a QP encoding of
an mbox file attached to a message saved in an mbox archive,
needlessly difficult for common tools such as e.g. formail, especially
if the tool happens to be not MIME aware.

I'm not sure I understand your point; processing of an attached
media type has several prerequisites:
1. a user indicates that processing may proceed (N.B. no
   automatic execution!)
2. a MIME-aware application decodes any transfer encoding &
   extracts the attachment
3. some application capable of processing the content -- it
   need not be MIME-aware
The MIME-aware application (typically a MUA) may pass
additional information (from parameters) to the media-
specific application


I'm thinking of the case of an application which iterates over an mbox
file extracting messages for processing (perhaps sending to a MIME
aware utility for display or otherwise).

If the extractor itself isn't MIME aware, all spurious From_ lines can
cause difficulty (e.g. do you also check if the following lines are
standard headers before accepting this as a message boundary?).  This
difficulty happens before any MIME-aware tools are handed whatever
fragment is deemed to be a full message.

Of course, it can be argued that either the spurious From_ lines would
always be escaped by whoever created the mbox file, or that MIME is
sufficiently old that any decent extracting tool should be aware of
attachment formats and boundary lines.

Perhaps formail is a bad example, but from the man page on my system,
it states:

     The regular expression that is used to find `real' postmarks is:
              "\n\nFrom [\t ]*[^\t\n ]+[\t ]+[^\n\t ]"


Also, it's worth pointing out that a QP encoding of "From " can be
either "From " (ie identical output) or "From=20", where the latter
would be much preferable, but the choice is up to the encoding
software, which probably doesn't know it's encoding a mail message.

-- 
Laird Breyer.


<Prev in Thread] Current Thread [Next in Thread>