John C Klensin <KLENSIN(_at_)INFOODS(_dot_)UNU(_dot_)EDU> writes (30 Apr 1993
13:18:43 -0400):
Note that + and =, and occasionally / are often used to specify
"subaddresses", special local handling associated with a given
mailbox, or storage directly into file systems. As long as the
"opacity" rule (no one but the RHS host tries to interpret anything on
the LHS of "@") is strictly followed and hosts with extended character
sets in mailbox names don't use these conventions, this is not a
problem, but it is a likely source of confusion to users and
implementations.
Of course new rules for enhanced address interpretation must
involve education of implementors and system administrators. The
same is true for all other measures that are taken to extend the
usability of the Internet protocols beyond what was originally
intended. The Internet has excellent instruments for
disseminating such information in the IETF standardization
process (mailing-lists, Internet drafts, RFCs).
The necessary changes of software and practices shouldn't be
exaggerated, though. This particular improvement of email
functionality will not imply _any_ changes in MTAs, in the mail
transport mechanism, or in the message format, it will only
affect UAs, gateways to other email systems and system
administration. (The level of functionality and user-
friendliness of Internet UA software really must be much improved
in the near future anyway, enhanced address interpretation would
be only a small and relatively simple part of this.)
The impact on users will be very small as I have tried to explain
in my previous messages.
People who almost exclusively use email within their own country
don't need to know about the two different forms (or rather
interpretations) of addresses, they can get on seeing, writing
and knowing only the enhanced form of addresses. But suppose
such a "naive" user is confronted (say on a business card ) with
on old-fashioned address of an American user containing "=" or
another character with a special meaning in enhanced address
interpretation and wants to write a letter to this person. No
real problem will arise: The sender will type in the ASCII
character sequence. If he has a good UA it will display the
address in both the enhanced form and the ASCII form (since it
was input in an unusual way). The letter will reach the American
recipient and when he looks at the message header he will see his
own address in the ASCII form he is used to.
In North America users and system administrators can completely
ignore this step to make the Internet more international and less
English-biassed. They don't even have to change any use of
encoding-special characters in their own mailbox names
More broadly, the fact that you haven't see the use of any of these
characters in network routing (akin to "!" and "%") doesn't mean that it
can't happen and, indeed, that there isn't some gateway out there which,
after being explicitly addressed, is calmly using them to feed rewrite
rules into its transport model.
The exact rules of the encoding should be chosen not to break
address translation schemes that are published as RFCs or
otherwise widely used. (I hope that isn't impossible, but my
insights into these matters are very limited.) To adjust the
encoding to any hack used to connect for example Microsoft Mail
systems to the email Internet isn't necessary in my opinion.
Well, MIME and the SMTP extensions are raising the conformance threshold
-- reducing the amount of nonsenses one can commit and still expect to
be able to get mail in and out. And we are seeing some serious
discomfort and some shaking out. Nothing we have done yet is as
potentially disruptive in subtle ways as changing the interpretation of
mailbox names.
Can you or someone else describe a concrete scenario in which my
proposal would disrupt the Internet? I still see it as much less
threatening than MIME in this regard. No one using an old-
fashioned UA will be prevented from sending mail to an address
containing (encoded) non-ASCII characters. No one using a new UA
will be prevented from sending mail to an old address, evenif it
happens to include special characters in such a way that it,
according to the enhanced address interpretation, contains non-
ASCII letters.
Beyond that, I'd prefer to see us concentrate on ways to avoid this
problem until and unless we see a real requirement. "Users want to"
doesn't carry a lot of weight with me here--users want DWIM behavior all
over the place, even if that implies deducing that the same uttered
sentence means radically things on different days, depending only on
their subconscious state.
1) We should be able to do something about problems we see
_before_ the situation gets so bad that users begin to
complain loudly.
2) I see this as a natural continuation of the effort to remove
English-bias in Internet protocols and make them truely
international. A request from the Scandinavian countries for
a legal method to use non-ASCII characters in RFC 822 messages
was one (if not _the_) starting-point for the MIME work.
Protests from the same countries in the fall of 1991 led to
RFC 1342, making it possible to use non-ASCII characters also
in the Subject: field. (That step was not taken because of
overwhelming user pressure, but because of foresight.) The
next logical step should be to make it possible to choose also
Internet _names_ from other languages than English and
Swahili.
3) There are several reasons why users aren't complaining loudly
about this problem. One is that Internet email outside
English-speaking countries still very much is a phenomenon
among fairly sophisticated people in the academic environment
and international companies, who are more or less fluent in
English and can live with the situation. Another reason is
that people, when introduced to a new technology like email,
really aren't inclined to make a fuss about its shortcomings
in minor respects, they prefer to explore the new
possibilities first and accept various hacks in the meantime.
(Compare the pre-MIME devices used in netnews to communicate
images, text in Vietnamese etc.)
4) I'm convinced that the possibility to have an obvious
connection between one's real name and one's email address
(even make the mailbox name almost equal to one's real name)
is a valuable property of the Internet address style. Why
should it not be extended to all Internet users and no longer
reserved for only the English-speaking users?
5) I'm probably as sceptical of attempts at providing DWIM
behavior in software as you are. But my proposal has nothing
to do with that. On the contrary, it provides a perfectly
deterministic way to map the actual names of people all over
the world to Internet addresses conforming to today's rules.
More short-term, local or partial solutions ("ways to avoid
the problem until we see a real requirement") will be more
problematic in this respect.
And Ohta-san is, of course, right-- one
shouldn't pretend to solve this problem by solving it for part of the
world's population and making a bigger mess for the rest.
I suppose that you're referring here to the criticism of ISO
10646 that it's unusable for representing text in Chinese,
Japanese and Korean because of the unification of CJK Han
characters. In my opinion this argument is misdirected in the
general case, but particularly for the encoding of addresses it
is completely lame. Personal names are not multilingual but
_monolingual_. In email addresses we therefore don't need a way
to distinguish between different Han characters solely because of
different language.
If you want to maintain interoperability at no worse than today's
levels, then unique mailbox identifiers need to remain in ASCII or
restricted subsets of it (and should preferably be case-insensitive).
Yes, this was also a goal for my proposed encoding, which I think
it fulfils.
--
Olle Jarnefors, Royal Institute of Technology, Stockholm
<ojarnef(_at_)admin(_dot_)kth(_dot_)se>