Date: 2005-02-08 08:39
From: John C Klensin <john-ietf(_at_)jck(_dot_)com>
Well, it is a little worse because there are tools that make
detection of the YAH00.COM problem and its relatives pretty easy
and those tools are widely understood. For example, forcing
those domain names to lower case makes them very distinguishable
(yahoo.com and yah00.com) are pretty clearly different) and
using fonts that make zeros and "o"s, ones and "l"s, etc.,
clearly different helps a lot too.
On the other hand, using lower case won't help if the "attacker"
uses Greek omicron instead of Latin 'O'.
With IDNs, the simple fact that there are tens of thousands of
characters with which one can try to create confusion, rather
than 37 or so, means there are going to be more "opportunities".
What is more important, perhaps, is that we just don't have the
experience with the design of user interfaces that make problem
detection easy. For example, the moment I touched the Firefox
cursor to the examples at the examples at
http://www.shmoo.com/idn/, I realized that I really wanted to
see the punycode in the status line as well as the "native
I assume that rather than "punycode" (which is an encoding scheme
used for *part* of IDNs) you mean the on-the-wire dot-separated DNS
name components consisting solely of letters, digits, and hyphens.
If so, I have two comments:
1. That's not likely to help, as humans aren't very adept at
decoding IDNs on sight, and distinguishing one IDN from
another on sight isn't something that one would expect
casual users to be able to do; all IDNs tend to look like
"xn--blah", and many casual users lack any of concern,
interest, inclination, or patience to look beyond "xn".
2. That would defeat the intent behind IDN, which is to present
what the on-the-wire DNS name represents rather than that
on-the-wire DNS name.
I'd add that one approach to the problem would be to undo the
encoding, query DNS to get an IP address, then present that
(possibly with associated SOA information and reverse domain
name lookup); numeric IP addresses aren't going to be mistaken
for some random collection of "characters" (in the Unicode
sense) or non-numeric glyphs.
Regarding suggestions that some authority or authorities
should enact some restrictions intended to prevent such
misleading names; in the absence of a globally-recognized
and effective enforcement mechanism, such measures are
meaningless. And I would hasten to add that a Big
Brother-esque world that such things would lead to would
be highly undesirable (at least by those of us who have
no interest in being "Big Brother").
Just as with the YAH00.COM case, no single measure is going to
"fix" or prevent the various problems we can encounter with
IDNs. But a combination of some thinking, good policies,
adapting tools on the basis of experience, and the level of user
vigilance that seems a requirement for being attached to the
Internet at all these days ought to permit us to use IDNs at
risk comparable to that for LDH-style ASCII names.
I suspect the problem is intractable, and is rooted in the
(IMO ill-conceived) conflation of public DNS "names" (meaning
keywords in the RFC 1958 / RFC 2277 sense) with natural language
/ legal "names" (proper names, trademarks, etc.).
[And I agree with Ohta-san's statement that we are observing
the inevitable consequences; not only of internationalization,
but of the underlying conflation of protocol elements with
natural language names.]
I would also like to take this opportunity to repeat an earlier
suggestion, viz. that the IAB should update RFC 1958 and give
that update some status more substantive than "Informational".
In particular, such an update should clearly state that
protocol elements are simply that; any resemblance to natural
language names, places, or things is purely coincidental.
I can only hope that our colleagues at Mozilla will rapidly
supercede their apparent advice to disable IDNs --advice that
seems to me to be equivalent to "you should be happy just using
I don't think that is the equivalent; letters, digits, and
hyphens are not peculiar to English, nor are domain name
components tied to any language -- they are simply protocol
elements that identify places in a hierarchical database
which maps to a database of values associated with a
hierarchical assemblage of such elements.
IMO, advice to disable IDNs is good advice; no
"internationalization" of protocol elements was necessary in the
first place, and the mechanism -- like a number of other
mechanisms in URL syntax (e.g. user/password delimiters in the
"authority" section, %-encodings) which have long been used to
obfuscate or mislead -- leads to predictable consequences. I
note in passing that other browser suppliers have disabled
similar mechanisms because of concerns about the sort of
issue under discussion.
Ietf mailing list