Re: 2822 revised grammar


On Sun July 24 2005 12:48, Frank Ellermann wrote:


Bruce Lilly wrote:

 [limit 76]

Unfolding, refolding, and other modification needs to take
that into account.  It's not a "bug", it's merely a fact that
MIME is widely used and cannot be ignored.


Widely used _because_ it can be ignored, relays should not try
to unfold or refold header fields, it's not their business.


List expanders sometimes insert " [list-name]" in the Subject field;
as that lengthens the line, the 76 char limit if there are any existing
encoded-words needs to be taken into account.  Likewise for UAs which
prepend noise to the Subject field (e.g "FWD: " etc.) when composing
responses.  It's not only a relay issue.

If they do it anyway (= wannabe gateway) they can ignore the
limit 76.


What, and convert a MIME-conforming message into junk so that the
recipient can't use it?

BTW, I just stumbled over another obscure limit, a 
boundary is limited to 70 characters.


Obscure?  It's in RFC 2046 and was in its predecessors.

BTW, your <obs-utext> is _very_ different from 2822, you have
only utext or NUL, 2822 has also %d127, bare CR, and bare LF.


%d127 is in NO-WS-CTL (in utext).  Lone CR and lone LF are
unconditionally prohibited in the message header.  obs-text permits
them (though it isn't used anywhere; there should probably be some
provision in "body" (for binary message content [*])).

 encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
 charset = token    ; see section 3
 encoding = token   ; see section 4
[...]
 encoded-text = 1*<Any printable ASCII character other than "?"
                   or SPACE>


That permits '(', ')', and '\\', which won't work in comments.  Also
'.' and others prohibited in phrases.

No "?".  Your <cchar> in <cew> allows "?", and that's wrong.
It doesn't depend on the encoding in section 4.  Even if we'd
invent =?us?X-BLURF?whatever?= now we cannot use a "?" in the
<encoded-text>.

It is absolutely related to the encoding


It's not, neither SPACE nor "?" is always REQUIRED.


Can't happen with B encoding.  SP[ACE] can't happen with Q encoding.

At minimum, a separate rule would have to be formulated.


No, cchar IS already the separate rule.  You just have to move
"?" from cchar to ctext:

 cchar = %d33-39 / %d42-62 / ; Printable US-ASCII characters
         %d64-91 / %93-126   ; excl. "(", ")", "?", or "\"

 ctext = NO-WS-CTL / cchar / "?" ; add "?" here


Maybe, but that obscures the meaning; cchar => characters allowed in
comments.

The only other xref of <cchar> is <cew>, and that's precisely
where you don't want the "?".  Both <uew> and <pew> are okay.

There would be a temptation to have not one, but two rules
because the characters are different for Q and B encoding


NAK.  Restrictions depending on the encoding are only relevant
for decoders, not for your purposes.  You're not interested to
decode this crap, you only want to parse it.  But to parse it
you need to know the 2047 "?" rule, it's essential.


Perhaps, but a separate rule would be the way to go.
---------
* perfectly legal for MIME with Content-Transfer-Encoding: binary, so
taking MIME into account should be part of "body", i.e. permitted for
generation.  If one puts on blinders and pretends MIME doesn't exist,
the tendency would be to leave it out (as is currently the case) or
put it in an obs- rule.  Another reason to take MIME into account when
updating 2822.