Re: Non-ASCII Internet addresses?

John C Klensin <KLENSIN(_at_)INFOODS(_dot_)UNU(_dot_)EDU> writes (30 Apr 1993 
13:18:43 -0400):

Note that + and =, and occasionally / are often used to specify
"subaddresses", special local handling associated with a given
mailbox, or storage directly into file systems.  As long as the
"opacity" rule (no one but the RHS host tries to interpret anything on
the LHS of "@") is strictly followed and hosts with extended character
sets in mailbox names don't use these conventions, this is not a
problem, but it is a likely source of confusion to users and
implementations.


Of course new rules for enhanced address interpretation must 
involve education of implementors and system administrators.  The 
same is true for all other measures that are taken to extend the 
usability of the Internet protocols beyond what was originally 
intended.  The Internet has excellent instruments for 
disseminating such information in the IETF standardization 
process (mailing-lists, Internet drafts, RFCs).

The necessary changes of software and practices shouldn't be 
exaggerated, though.  This particular improvement of email 
functionality will not imply _any_ changes in MTAs, in the mail 
transport mechanism, or in the message format, it will only 
affect UAs, gateways to other email systems and system 
administration.  (The level of functionality and user-
friendliness of Internet UA software really must be much improved 
in the near future anyway, enhanced address interpretation would 
be only a small and relatively simple part of this.)

The impact on users will be very small as I have tried to explain 
in my previous messages.

People who almost exclusively use email within their own country 
don't need to know about the two different forms (or rather 
interpretations) of addresses, they can get on seeing, writing 
and knowing only the enhanced form of addresses.  But suppose 
such a "naive" user is confronted (say on a business card ) with 
on old-fashioned address of an American user containing "=" or 
another character with a special meaning in enhanced address 
interpretation and wants to write a letter to this person.  No 
real problem will arise:  The sender will type in the ASCII 
character sequence.  If he has a good UA it will display the 
address in both the enhanced form and the ASCII form (since it 
was input in an unusual way).  The letter will reach the American 
recipient and when he looks at the message header he will see his 
own address in the ASCII form he is used to.

In North America users and system administrators can completely 
ignore this step to make the Internet more international and less 
English-biassed.  They don't even have to change any use of 
encoding-special characters in their own mailbox names

More broadly, the fact that you haven't see the use of any of these
characters in network routing (akin to "!" and "%") doesn't mean that it
can't happen and, indeed, that there isn't some gateway out there which,
after being explicitly addressed, is calmly using them to feed rewrite
rules into its transport model.


The exact rules of the encoding should be chosen not to break 
address translation schemes that are published as RFCs or 
otherwise widely used.  (I hope that isn't impossible, but my 
insights into these matters are very limited.)  To adjust the 
encoding to any hack used to connect for example Microsoft Mail 
systems to the email Internet isn't necessary in my opinion.

Well, MIME and the SMTP extensions are raising the conformance threshold
-- reducing the amount of nonsenses one can commit and still expect to
be able to get mail in and out.  And we are seeing some serious
discomfort and some shaking out.  Nothing we have done yet is as
potentially disruptive in subtle ways as changing the interpretation of
mailbox names.


Can you or someone else describe a concrete scenario in which my 
proposal would disrupt the Internet?  I still see it as much less 
threatening than MIME in this regard.  No one using an old-
fashioned UA will be prevented from sending mail to an address 
containing (encoded) non-ASCII characters.  No one using a new UA 
will be prevented from sending mail to an old address, evenif it 
happens to include special characters in such a way that it, 
according to the enhanced address interpretation, contains non-
ASCII letters.

Beyond that, I'd prefer to see us concentrate on ways to avoid this
problem until and unless we see a real requirement.  "Users want to"
doesn't carry a lot of weight with me here--users want DWIM behavior all
over the place, even if that implies deducing that the same uttered
sentence means radically things on different days, depending only on
their subconscious state.


1) We should be able to do something about problems we see 
   _before_ the situation gets so bad that users begin to 
   complain loudly.

2) I see this as a natural continuation of the effort to remove 
   English-bias in Internet protocols and make them truely 
   international.  A request from the Scandinavian countries for 
   a legal method to use non-ASCII characters in RFC 822 messages 
   was one (if not _the_) starting-point for the MIME work.  
   Protests from the same countries in the fall of 1991 led to 
   RFC 1342, making it possible to use non-ASCII characters also 
   in the Subject: field.  (That step was not taken because of 
   overwhelming user pressure, but because of foresight.)  The 
   next logical step should be to make it possible to choose also 
   Internet _names_ from other languages than English and 
   Swahili.

3) There are several reasons why users aren't complaining loudly 
   about this problem.  One is that Internet email outside 
   English-speaking countries still very much is a phenomenon 
   among fairly sophisticated people in the academic environment 
   and international companies, who are more or less fluent in 
   English and can live with the situation.  Another reason is 
   that people, when introduced to a new technology like email, 
   really aren't inclined to make a fuss about its shortcomings 
   in minor respects, they prefer to explore the new 
   possibilities first and accept various hacks in the meantime.  
   (Compare the pre-MIME devices used in netnews to communicate 
   images, text in Vietnamese etc.)

4) I'm convinced that the possibility to have an obvious 
   connection between one's real name and one's email address 
   (even make the mailbox name almost equal to one's real name) 
   is a valuable property of the Internet address style.  Why 
   should it not be extended to all Internet users and no longer 
   reserved for only the English-speaking users?

5) I'm probably as sceptical of attempts at providing DWIM 
   behavior in software as you are.  But my proposal has nothing 
   to do with that.  On the contrary, it provides a perfectly 
   deterministic way to map the actual names of people all over 
   the world to Internet addresses conforming to today's rules.  
   More short-term, local or partial solutions ("ways to avoid 
   the problem until we see a real requirement") will be more 
   problematic in this respect.

And Ohta-san is, of course, right-- one
shouldn't pretend to solve this problem by solving it for part of the
world's population and making a bigger mess for the rest.


I suppose that you're referring here to the criticism of ISO 
10646 that it's unusable for representing text in Chinese, 
Japanese and Korean because of the unification of CJK Han 
characters.  In my opinion this argument is misdirected in the 
general case, but particularly for the encoding of addresses it 
is completely lame.  Personal names are not multilingual but 
_monolingual_.  In email addresses we therefore don't need a way 
to distinguish between different Han characters solely because of 
different language.

If you want to maintain interoperability at no worse than today's
levels, then unique mailbox identifiers need to remain in ASCII or
restricted subsets of it (and should preferably be case-insensitive).


Yes, this was also a goal for my proposed encoding, which I think 
it fulfils.

--
Olle Jarnefors, Royal Institute of Technology, Stockholm 
<ojarnef(_at_)admin(_dot_)kth(_dot_)se>