ietf-822
[Top] [All Lists]

Re: FYI: BOF on Internationalized Email Addresses (IEA)

2003-10-28 09:34:37

Everyone,

Either for general efficiency or just to do me a favor, can we
please pick one list --I'd recommend IMAA unless Paul objects--
and move these discussions to it only.   I'd like to participate
(after all, it is my draft and BOF request that set off these
two threads), but am in an environment that is hostile to my
being to read email in a leisurely way -- the cross-postings are
making the volume look larger than it is, and I don't have time
to sift through and organize it.

Now, an observation or two.

Keith, please read draft-klensin-emailaddr-01.txt -- it contains
a fairly extensive treatment of the issue you identify below.
It also explicitly discusses the tradeoffs along the spectrum
from easy global interoperability (at both the prootocol and the
user interface/perception level) to full, culturally-appropriate
and optimized, localization.  Short answer is "can't have it
both ways", but that is a no-brainer.   I don't know that my
analysis is any better than yours, or where you would eventually
end up, but, if we can start from a common base and terminology
and then, as needed, argue about it, we will, I think, save a
lot of time.

It also explores the case beyond the one you and Mark are
discussing -- what happens if one decides to start tampering
with the "@" in mail or those nasty ASCII slashes, etc., in
URIs: if one is to go all the way to significantly non-Roman
scripts, those need to go too... or, at least, we need to
explore whether that is sensible and plausible.

There are two additional issues that I should have written about
in the draft and didn't.   

(1) Another advantage of "just" using UTF-8 in an appropriately
negotiated, controlled, and constrained environment is that any
idiosyncracies and coding difficulties are Unicode
idiosyncracies and coding difficulties.  If we decide to use a
specialized coding designed for email local-parts (and, fwiw, I
think Adam's coding solution is brilliant... I'm ultimately just
unhappy with the problem definition to which it responds), then
we have to deal with both its idiosyncracies _and_ those of
Unicode.  Strikes me as a bad idea -- better to just blame
"them" :-)

(2)  One might imagine using the machinery outlined in that
draft to transport mail across the network, and then, if needed,
use IMAA encoding to push the message into the mail store, make
it available for IMAP and POP, etc.  Not an ideal situation, but
that would clearly put that coding into the category of a
transition strategy that we could incrementally retire.  By
contrast, once we start moving tricky encodings across the
network as an alternative to a transport-based solution, every
realistic scenario I can think of says that we are stuck with
them forever.   That is, I think, more or less one of Mark's
arguments, but with a slightly different twist.

One final observation for now...

Our success record in not requiring email addresses to be typed
in, usually associated with some version of "The Directory", has
been abysmal.  Similarly, if users never had to look at URLs
(which was the intent) we would almost certainly not be having
these arguments about domain names and their formats -- the
"protocol element" argument would fly, and we'd all be working
on internationalization at a less constrained level of
abstraction.  But the pigs don't seem to be circling at
altitude, at least here in Carthage.

     john


--On Tuesday, October 28, 2003 10:27 -0500 Keith Moore
<moore(_at_)cs(_dot_)utk(_dot_)edu> wrote:

It is currently impossible to use the Internet without
knowing the Latin script. However, the goal of most
well-designed client software and operating systems is to
permit the user to work entirely within their native
language, with a fully localized system. This is reaching to
India and other countries; Microsoft has introduced fully
localized versions of Indic Windows just recently, and Linux
vendors are hard at work to produce fully localized versions
of their software.

Email and Web addresses are the big remaining holdouts for
most people. People should not be forced to use a script that
they are unfamiliar with, just to use email addresses and
sites in their own countries. Even if they are familiar with
the Latin script, it is very often a very bad match for their
languages, making it very difficult to figure out how native
words would be spelled in it.

and yet, as we've seen time and time again, local use of
nonportable addresses can cause major problems for the net as
a whole.   we saw this in earlier days of email with the
admixture of bitnet/rscs/nje, uucp, decnet, x.400, and Internet
addresses.  we've seen it in the IP space with RFC 1918
addresses.

in some ways internationalizing email addresses is a much
harder problem than internationalizing IDNs, because no other
application is as dependent on having human beings actually
use addresses as email.  (yes, people do sometimes type in
URLs, but not nearly as often as they click on links.  and
there are apps which require humans to type in domain names,
but for most of them this only happens  at configuration time.)

one way to approach this problem might be to make email less
dependent on having addresses typed in.

Keith