ietf-smtp
[Top] [All Lists]

Re: Last Call: draft-klensin-rfc2821bis: Permitted and prohibited characters in addresses

2008-01-04 11:19:51

Hi.

I'm not copying the IETF list on this because the discussion has
already gotten too noisy as a consequence of people not reading
the text carefully enough (sadly enough, myself included).

First, please remember that the syntax of an address in 821 and
its successors have always been more restrictive than the
address syntax of 822 and it lineage.  The most obvious example
is that 822 permits personal name phrases and 821 does not.
While some of the other differences are probably just historical
accidents, others are due to the fact that, at one time, a large
number of non-Internet mail systems used an 822, or 822-like,
header structure and it seemed better to include than to ban
them.   Once messages passed from those systems into the public
Internet, the more restrictive Internet rules for the mailbox
name itself applied.  The text about gateways in RFC 1123
(especially Section 5.3.7) and text in 2821 should make that
abundantly clear.

In principle, we could announce that non-Internet mail
(presumably other than things covered by USEFOR) is ancient
history and go back and eliminate the differences.  But, even if
there were not philosophical reasons to not try to do that (and
I think there are), trying to get all the ducks lined up, verify
that none of the edge cases are problems and that all of the
"conform only to one" systems are completely gone (or otherwise
account for them) is not a trivial matter and is certainly not
appropriate for a Proposed -> Draft transition.   The
differences are inconvenient for those who believe there aren't
any; they are not a sign that anything is broken.

Second, while we have been moving increasingly in the direction
of having as much specification in syntax productions as
possible and in having those productions constitute a complete
grammar of the relevant protocol, there has never been a
requirement that specifications be written that way.   RFC 821
was certainly not written that way and 2821 isn't either.  For
them, there is no top-level note from which everything else
descends.  Syntax productions are used to _illustrate_ what a
command looks like, not to completely define its grammar.  The
grammar is defined by a combination of

        * syntax productions
        * narrative rules imbedded as comments in the syntax
           tables
        * prose text in the document that restricts or 
          qualifies the syntax productions.

In the particular case of permitted characters in mailbox
local-parts, the transformation between the BNF of 821 and the
revised ABNF of 2821 and the desire to minimize the number of
basic grammar definitions that were duplicated between 2821 and
2822 seems to have introduced some additional confusion.  But,
because of the prose (if anyone actually reads it), it is clear
that the rules are different from the syntax alone.

In particular:

(1) Alexey, as Frank points out, there is no prohibition on the
use of space in local-parts in 2821, but those spaces must be
quoted. In other words, 
   RCPT TO:<"Acct\ UserID"@domain>     and
   RCPT TO:<"Acct UserID"@domain>
are valid (although the first one is discouraged now) but
   RCPT TO:<Acct UserID(_at_)domain>
is not and has never been.  See (3) below before you reread the
current text.


(2) Despite the way the 2821 syntax is constructed, with an
invocation of "qcontent" which implicitly references a
definition in 2822 and its use of the contentious NO-WS-CTL,
there is very clear (to me at least) prose in 2821 that
prohibits those controls entirely.  That is, we don't need to
have a further discussion about whether it is time to finally
get rid of them because we did it during DRUMS.   I will leave
to others whether or not that raises an interoperability
testing/documentation problem with 2821bis, but note that 2821
(and 2821bis) are written to simply move excessive
permissibility outside the scope of the standard and not to
prohibit them.  The relevant text, from Section 4.1.2 of 2821bis
(both the text and the section number are identical in 2821)
reads:

# Systems MUST NOT define mailboxes in such a way as to
# require the use in SMTP of non-ASCII characters (octets with
# the high order bit set to one) or ASCII "control characters"
# (decimal value 0-31 and 127).  These characters MUST NOT be
# used in MAIL or RCPT commands or other commands that require
# mailbox names.

This seems very clear to me.  While it introduces an additional
issue for the EAI work (already noted on their mailing list and,
by definition, not a 2821bis problem), it is not an issue here.


(3) While I think this all works, the observation that it has
confused, in no particular order and among others, Alexey about
spaces; Frank about NO-WS-CTL in 2821; Ned, Pete, and myself
about permitting control characters in local-parts of mailbox
names suggests to me that it is just too confusing for normal
readers. Consequently, I suggest the following two changes to
2821bis while emphasizing that these change the way in which the
specification is defined but not the spec itself.

(3.a) Change the definition of Quoted-string in 2821bis more or
less as follows (the author of the 2821 ABNF is asked to improve
this if possible).

        Quoted-string  = DQUOTE *qcontentSMTP DQUOTE
        
        QcontentSMTP   = qtextSMTP / quoted-pairSMTP
        
        quoted-pairSMTP      = %d92 %d32-126
                         ; i.e., backslash followed by any
                         ; ASCII graphic (including itself) or
                         ; SPace
        
        qtextSMTP            = %d32-91 / %d93-126
                         ; i.e., within a quoted string, any
                         ; ASCII graphic or space is permitted
                         ; without blackslash-quoting except the 
                         ; backslash itself.

Unless I've missed something, this exactly restores both the
substance and form of the 821 rule, adjusted to reflect the
prohibition on control characters introduced in 2821.  Since the
paragraph quoted above prohibits an SMTP-sender from ever
sending a mailbox string containing control characters, we don't
even need a way to escape them any more.

I'll leave it to Pete to straighten this situation out in
2822bis, nothing that 2821 does not contain the clear
prohibition on the use of control characters in addresses that
2821 does.

(3.b) While (3.a) makes part of it redundant, I suggest leaving
the paragraph quoted above in the text and intact.  It contains
the explicit MUST NOT prohibition and may be more clear in
context than the ABNF comments.

(3.c) In order to reduce the odds of another round of this sort
of problem, I recommend adding the following sentences to the
last paragraph of Section 2 (the paragraph that starts "The
metalinguistic notation used..."):

                The reader is cautioned that the grammar expressed in
                the metalanguage is not comprehensive.  There are many
                instances in which provisions in the text constrain or
                otherwise modify the syntax or semantics implied by the
                grammar.


Does that help move us forward?

     john