ietf-822
[Top] [All Lists]

Re: drums2?

2002-08-22 10:18:29

begin  quotation by Charles Lindsey on 2002/8/22 15:45 +0000:
In <200208211545(_dot_)g7LFjL019594(_at_)astro(_dot_)cs(_dot_)utk(_dot_)edu> 
Keith Moore
<moore(_at_)cs(_dot_)utk(_dot_)edu> writes:
2. invalidate a huge number of existing user agents by creating a new
syntax for local parts that is incompatible with the old one.

Do we actually know what part of the huge range of possibilities within
the present syntax of local-part are actually in regular use in existing
user agents or in actual email addresses?

No. Although in 2822 we deprecated the most problematic constructs (particularly the ones permitted in 822 but not in 821). It's risky to retroactively declare previously compliant constructs to be incompliant, even if we didn't know of any specific implementations. So this is a case where what's legal by the standard is a superset including constructs which are not a good idea to use.

If you're curious, here's a writeup I did of special characters in email addresses:

Special Characters in Email Addresses

Summary: This discusses why it's safest to stick to alphanumerics when
creating email addresses on a system.  A set of five "conditionally safe"
characters (_.-=+) are reasonable to use in the middle of an address on
most systems, although if the system's primary subaddress delimiter is one
of these, it should be forbidden.

RFC 822 permits all ASCII characters in email local-parts. RFC 2822
permits all but NUL (ASCII 0x00).  But just because a character is
permitted doesn't mean it will interoperate in practice.  The morass of
escape characters, quoting schemes and encoding schemes which impact
various parts of real-world email systems make most US-ASCII punctuation
characters problematic in practice.

Always Safe Characters: a-z A-Z 0-9
 US-ASCII alphanumerics are always safe in email address local-parts.
 Systems are permitted to be case-sensitive, but most are case-insensitive
 in practice.

Conditionally Safe Characters (avoid at beginning of local part): _.-=+
 '_' is almost always safe, but I've seen it have special meaning at the
     beginning of a local part.  Many sites use a First_Last(_at_)domain
     naming scheme so it's likely to work.
 '.' is safe between words, but requires quoting at beginning, end or if
     doubled.  Many sites use a First(_dot_)Last(_at_)domain naming scheme, so 
it's
     likely to work.  The CMU Cyrus server uses it as a mailbox hierarchy
     delimiter, so it's forbidden in user names on that system.
 '-' is safe on systems which don't use it as a subaddress delimiter.  On
     systems which do use it as a subaddress delimiter (qmail), it'd be
     safer not to use it in user names.  Many sites use it in non-human
     mailbox names (e.g. mailing lists).  Primary subaddress use is for
     mailing list "-request" and "-owner" suffixes (RFC 2142).
 '=' is safe on most systems which don't use it as a subaddress delimiter.
     But it is a URL reserved character (see below).
 '+' is safe on most systems which don't use it as a subaddress delimiter.
     It is the primary subaddress delimiter for iPlanet Messaging Server,
     SIMS, PMDF and CMU Cyrus (among others).  It shouldn't be used when
     creating an email address for a user on these systems, but should be
     permitted in address lists and general-purpose email address entry
     forms.  Note that it is a URL reserved character and I've seen it
     cause problems on some ill-designed web forms with an email address
     entry field.  These web forms should be fixed.

Confusing: ~`
 When email addresses are written down, these characters are likely to
 be confused with other characters (~ with - and ` with ').  Thus they
 are best avoided for human factors reasons.  In addition, not everyone
 is familiar with the word 'tilde'.  While '+' and 't' can also
 be confused, I haven't had any problems in practice since I started
 accentuating the horizontal line and reducing the vertical line when
 writing '+'.

Path Delimiter: /
 May be safe as long as it is not the first character (it sometimes
 indicates direct-file delivery as a leading character).  Many Unix
 systems forbid it in user mailbox names since the names are stored
 unquoted in the filesystem and '/' is the delimiter.  Sometimes used
 as a subaddress delimiter (e.g. RFC 2846), but not by iMS.

Shell Metacharacters: $&*?!~^()[]{}"'\|
 Internet mail is often stored in Unix directories whose name is the
 unquoted user name.  Thus characters which have special meanings to Unix
 shells are discouraged because they raise security issues on some
 Unix-based mail systems.  '*', '?', and '|' are most dangerous, while
 some of the others ('~') are not a big deal when not used at the
 beginning of a user name.

URL Reserved Characters (RFC 2396): ;/?:@&=+$,
 These have special syntactic meaning in various portions of generic URL
 syntax and thus some web systems will choose to encode them when they're
 used for something other than their URL-specific meaning.  Thus they
 might introduce errors.

URL Excluded (RFC 2396): {}|\^[]`<>#%" Controls (0x00-0x1F, 0x7F), SPACE (0x20)
 The standard requires these to be encoded in URLs, and many webmail
 services or email address entry forms are likely to make mistakes when
 its necessary to encode or decode these characters.

Email quoting required (RFC 821/822): ()<>[]@,:;"\
 Also Controls (ASCII 0x00-1F, 0x7F) and SPACE (ASCII 0x20)
 These all require quoted local-parts which some systems don't
 implement according to the standards (e.g. Exchange), so their use
 is discouraged.  @, " and \ are particularly problematic.

Eight-bit (RFC 821, RFC 2821):
 Characters with the high bit set are not permitted by current email
 standards. In the future, standards may be changed to permit _only_
 UTF-8 when negotiated, likely with a downconversion to a 7-bit
 encoding scheme yet to be determined (likely the same one used for
 international domain names).

C: NUL (ASCII 0x00)
 The NUL character is used to terminate C/C++ strings.  Since the majority
 of Internet software is written in C/C++, NUL won't work on most mail
 systems.  RFC 822 permits it, but RFC 2822 forbids it.

Local Routing character conventions: !%@
 These have been used to express routing on local systems.  Their use in
 user email addresses is thus discouraged to be extra safe given the
 concerns they raise.

IMAP modified UTF-7 (RFC 2060): &
 The '&' character is the escape character for IMAP modified UTF-7.
 Thus it is likely to cause problems in user names on an IMAP-based
 mailstore.

LDAP search filter specials (RFC 2254): *()\ and NUL
 More systems, including iPlanet Messaging Server, are using LDAP to
 locally route email.  These five characters need to be quoted in LDAP
 search filters, but the traditional LDAP C SDK makes this step easy to
 forget.

Javascript: '"\
 Most client-side web form validation is done using Javascript.  Thus the
 primary quoting characters in Javascript may be problematic in email
 addresses.


<Prev in Thread] Current Thread [Next in Thread>