Re: 2822 revised grammar


Bruce Lilly wrote:

In syntax it is, <word> is an "abbreviation" for
<atom> or <quoted-string>.

No sale.


Fine, also true for your offer to treat the name <CFWS> as an
"abbreviation" different from other names like <word>, <blurf>,
or <sugar>.

Among the symbols defined in RFC 822 and referenced in this
memo are: 'addr-spec', 'atom', 'CHAR', 'comment', 'CTLs',
'ctext', 'linear-white-space', 'phrase', 'quoted-pair'.
'quoted-string', 'SPACE', and 'word'.


CHAR is bad => 822 says ASCII, 2234(bis) says ASCII minus NUL.
Fortunately 2045 uses only CHAR excl. CTL, WSP, or <tspecials>.

the revised grammar attempts a solution by clearly defining
the different types of encoded-words (in phrases, in
comments, and in unstructured fields)


Yes, I've seen <cew>, <pew>, and <uew> in your text.

It would still require careful rewording of a 2047 successor.
Because of the interdependence, such issues need to be
considered when the 2822 successor is produced.


Considerations are fine, only solving it in the same text could
be too much.  After reading draft-klensin-emailaddr-i18n-03.txt
I'm in my "grumpier than anybody claiming to be grumpy" mood:

John proposes some kind of "FUSSP" without the critical second
"S" for I18N.  At least he wants to get rid of <quoted-pair> -
if I understood it correctly, it's a plan for the 22nd century.

 [limit 76]

Unfolding, refolding, and other modification needs to take
that into account.  It's not a "bug", it's merely a fact that
MIME is widely used and cannot be ignored.


Widely used _because_ it can be ignored, relays should not try
to unfold or refold header fields, it's not their business.

If they do it anyway (= wannabe gateway) they can ignore the
limit 76.  BTW, I just stumbled over another obscure limit, a
boundary is limited to 70 characters.  Too many magic numbers,
too many subsets of ASCII, too many RfCs, it's a royal PITA.

It is unclear whether or not
   Subject:=?us?q?foo_bar?=
is legal


First guess: yes.  No %d127, no NUL, no bare CR. no bare LF =>
no problem with 822 vs. 2822.  Or 2821, why on earth has 2821
%d127 as an ordinary instead of a control character ?

BTW, your <obs-utext> is _very_ different from 2822, you have
only utext or NUL, 2822 has also %d127, bare CR, and bare LF.

It's unclear whether
   Subject:foo
    =?us?q?bar_baz?=
is legal, because the encoded word is separated from the
text "foo" not by linear-white-space but by line folding.


My model of FWS is "the same as LWSP, but at most one CRLF".

Actually a similar argument as you used it for FWS wrt CFWS,
my justification is "different standards, different authors".

Yes, but no "?" is a _general_ rule for <encoded-text> in
chapter 2, it does not depend on the encoding.

?!?


 encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
 charset = token    ; see section 3
 encoding = token   ; see section 4
[...]
 encoded-text = 1*<Any printable ASCII character other than "?"
                   or SPACE>

No "?".  Your <cchar> in <cew> allows "?", and that's wrong.
It doesn't depend on the encoding in section 4.  Even if we'd
invent =?us?X-BLURF?whatever?= now we cannot use a "?" in the
<encoded-text>.

It is absolutely related to the encoding


It's not, neither SPACE nor "?" is always REQUIRED.

At minimum, a separate rule would have to be formulated.


No, cchar IS already the separate rule.  You just have to move
"?" from cchar to ctext:

 cchar = %d33-39 / %d42-62 / ; Printable US-ASCII characters
         %d64-91 / %93-126   ; excl. "(", ")", "?", or "\"

 ctext = NO-WS-CTL / cchar / "?" ; add "?" here

The only other xref of <cchar> is <cew>, and that's precisely
where you don't want the "?".  Both <uew> and <pew> are okay.

There would be a temptation to have not one, but two rules
because the characters are different for Q and B encoding


NAK.  Restrictions depending on the encoding are only relevant
for decoders, not for your purposes.  You're not interested to
decode this crap, you only want to parse it.  But to parse it
you need to know the 2047 "?" rule, it's essential.  Bye, Frank