Re: FYI: BOF on Internationalized Email Addresses (IEA)


The extremely broad To/Cc list was appropriate for the initial
announcement of the BOF, but for this ensuing discussion I'm guessing it
would be good to trim it down, so I did.

Mark Crispin <mrc(_at_)CAC(_dot_)Washington(_dot_)EDU> wrote:

As presently constituted, email addresses are limited to the 26 Latin
alphabetics, 10 digits, and a limited number of special characters in
the ASCII character set.


Not so limited.  According to RFCs 821 & 822, all ASCII characters are
allowed.  According to RFCs 2821 & 2822, NUL is "obsolete", as are CR
and LF except as the pair CRLF.  (Obsolete means must be accepted and
must not be generated.)

Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> wrote:

What is the source of the "growing need"?  Is it:

a. for users of many languages (particularly those not using Latin
   alphabets) email addresses are difficult to remember
b. for users of many languages (particularly those not using Latin
   alphabets) email addresses are difficult to transcribe or type
c. users want to use their names in email addresses
d. users are confused by apparently arbitrary restrictions on use of
   characters in email addresses, and this leads to mistakes
e. on computer systems employing non-ASCII names for other purposes
   (e.g. login or account names) these do not map well to ASCII email
   addresses

or something else that I don't see?


Regarding (a), there are at least two kinds of remembering: one is
recognition (is this address the same one I saw yesterday? is it a font
variation or a different character?); the other, more challenging, is
recall (mentally retrieve the address I saw yesterday).  Even harder
than remembering is reproducing (draw the characters or find them on a
keyboard) which is (b).

I've heard claims of all of those sources, except (d).  But I think (d)
will become true if internationalized mail addresses are not introduced.
I think users will be astonished that non-ASCII characters are allowed
after the at-sign but not before it.

I guess a problem statement should include both the motivation and the
challenge.  The challenge is the same as for internationalized domain
names:  Given a huge installed infrastructure of protocols, end-user
software, and intermediate software, all built on the assumption that
identifiers are ASCII, how can you relax that assumption without causing
so much breakage and non-interoperability that people would rather stick
with the existing ASCII system than endure the transition?

There are presumably several challenges, but that is the one that I see
as the main challenge.  I suppose that the people advocating approaches
very different from IMAA might think I'm overestimating the height
of this hurdle, and therefore might see something else as the main
challenge.

AMC