Dusting off this particular bit of ancient history...
--On Wednesday, 17 October, 2007 09:46 -0700 Pete Resnick
I've copied this conversation over to ietf-822, since I think
that will probably be where this issue is addressed.
On 10/17/07 at 6:51 AM -0700, ned+ietf-smtp(_at_)mrochek(_dot_)com wrote:
Peter J. Holzer wrote:
Which reminds me: What is the reason for allowing control
characters in email addresses?
I try hard to stay out of these sorts of debates, but in this
case I am forced to agree. We are way past the point where
backwards compatibility is an acceptable excuse for
something like this. It should be obvious why allowing
nonprinting control characters in addresses is a bad idea on
And if common sense isn't enough here, how about the fact
that it is pretty clear that these things have obvious and
severe interoperability problems that will make it hard to
meet the criterua for draft standard?
Right now, control characters are allowed in the following:
- Domain literals on the right hand side of addresses
This is the one that affects 2821bis. The control characters
cannot appear in either IPv4 address or IPv6 address domain
literals. They can, in principle, appear in General Address
literal, but those require standards track action, etc., and one
could easily cut them off there. In other words, they are
permitted in principle, but prohibited in practice and are going
to stay that way. It is therefore probably safe to let this
Agreed. The issue is really with 2822. 2821's rules for defining new address
literals offer sufficient protection IMO.
I'm not going to do any for the version of 2821 that I intend to
post later tonight, but, if there is consensus that it is time
to do away with the puppies (and I am very much in agreement
with Ned that they should go), it may be time to invent a new
metalanguage rule for, e.g., "pcontent" (for printable) which
consists of characters in the ASCII range from 20 to 7E (hex).
Seems like a reasonable thing to do to me. The interesting thing is that
this excludes tab, and I actually think that's the right thing to do.
Looking ahead to i18n issues, it may be desirable to have a
similar rule for Unicode characters, but the characters can't be
enumerated (in practice and to preserve version-independence) so
the rule would have to be defined in terms of character class
I haven't looked at this in much detail but I suspect it's going to be tricky
to get right. For example, the tab issue, if you'll pardon the pun, expands to
include the many other ways of representing whitespace provided by Unicode. Now
think of how this might interact with BIDI rules...
- Quoted strings (which can be on the left hand size of
addresses as well as lots of other structured header fields)
The address part of this is a little more problematic than the
case above, but also easier, under the "you can screw only
yourself by putting one of those addresses up on your server"
The rest of this is, indeed, a 2822 problem.
Yep, that's where it really needs to be dealt with.