[Top] [All Lists]

Re: Non-ASCII Internet addresses?

1993-04-30 10:19:13
Two observations, one specific and one general.

I have proposed the use of five characters with special meaning
for encoding non-ASCII characters in addresses:  *&'_=

As an alternative for the first four, possible to use in X.400(84)
addresses, I suggested:  =/'+

Note that + and =, and occasionally / are often used to specify
"subaddresses", special local handling associated with a given
mailbox, or storage directly into file systems.  As long as the
"opacity" rule (no one but the RHS host tries to interpret anything on
the LHS of "@") is strictly followed and hosts with extended character
sets in mailbox names don't use these conventions, this is not a
problem, but it is a likely source of confusion to users and

More broadly, the fact that you haven't see the use of any of these
characters in network routing (akin to "!" and "%") doesn't mean that it
can't happen and, indeed, that there isn't some gateway out there which,
after being explicitly addressed, is calmly using them to feed rewrite
rules into its transport model.


I think we are asking the wrong question here.  We are asking "is there
some way to make this work".  I think the answer to that is almost
certainly "yes, given conforming and robust implementations of both
clients and servers".  

But we've just turned up problems as a result of the the transport
extensions that result from blatant violation or gross misreading of the
standards (and, to paraphrase Mark, some of these folks aren't
interested in listening, much less fixing their problems).  That follows
on experiences with MIME in which header-trashing by MTAs and gateways
has defeated our best efforts to make these extensions completely
transparent to transport issues.  This shouldn't be surprising--most of
us have known for years that there are lots of terrible systems in the
overall mail environment; that things work at all only because lots of
software has been incrementally engineered to be robust against all
sorts of nonsense.

Well, MIME and the SMTP extensions are raising the conformance threshold
-- reducing the amount of nonsenses one can commit and still expect to
be able to get mail in and out.  And we are seeing some serious
discomfort and some shaking out.  Nothing we have done yet is as
potentially disruptive in subtle ways as changing the interpretation of
mailbox names.

So, first of all, I'd like to see us let the things we already have out
there as proposed standards shake out the first-level trash
implementations before we deploy something that is sensitive to
conformity to [usually] second-level rules.

Beyond that, I'd prefer to see us concentrate on ways to avoid this
problem until and unless we see a real requirement.  "Users want to"
doesn't carry a lot of weight with me here--users want DWIM behavior all
over the place, even if that implies deducing that the same uttered
sentence means radically things on different days, depending only on
their subconscious state.  In some cases, we have an educational job to
do: the "user's real name should be the mailbox name" convention is just
that, and very handy, but few of us can be telephoned by punching out
our names on the phone's keypad or sent postal mail without address
elements other than our names.  And Ohta-san is, of course, right-- one
shouldn't pretend to solve this problem by solving it for part of the
world's population and making a bigger mess for the rest.

If you want to maintain interoperability at no worse than today's
levels, then unique mailbox identifiers need to remain in ASCII or
restricted subsets of it (and should preferably be case-insensitive).  
Notwithstanding Keith's comments, pointers and references to mailbox
identifiers could rationally take other forms in appropriate
environments if the value of doing so outweighed the costs of what users
would see as multiple names for the same mailbox.


<Prev in Thread] Current Thread [Next in Thread>