ietf
[Top] [All Lists]

Re: IDN and language

2005-01-04 10:23:35


--On Tuesday, 04 January, 2005 09:38 -0500 Bruce Lilly
<blilly(_at_)erols(_dot_)com> wrote:

One is not.  Domain names are strings of characters; only
incidentally do they spell out one or more words in one or
more languages.  I doubt whether the names "Google," "Yahoo,"
and "AltaVista" can be pinned down as belonging to one
specific language.

I was referring specifically to internationalized domain names
(IDN, RFCs 3490, 3491, 3492, 3743) where the on-the-wire
domain name continues to be of traditional form (ANSI X3.4
letters,digits, and hyphen (with restrictions on combinations
and placement)), but where a certain class of names (those
beginning with "xn--") are "internationalized" and might be
presented to users in a different form (which can include
non-ASCII characters).  That came about because of the
tendency to associate a domain name (tag) with a natural
language "name" or legally-registered name (trademark, etc.).
Whether one considers such associations logical or
irrational, that is what has happened.  So one could have
a domain name (beginning with xn--) that is presented by
an application as "Nestlé.com".  Now certainly some names,
such as your examples, Kodak, Häagen-Dazs, etc. have no
language (because they are made-up strings of characters),
but others do have a specific language.  In skimming through
the RFCs mentioned above, it appears that there is now some
provision for language tagging (which was not present in
earlier versions of IDN).  However, I have not thoroughly
reviewed those recent additions; therefore it should be
clear that I have not reviewed the impact of the proposed
draft changes on IDN or vice versa.  Such a review should
take place (ideally before the deadline for the New Last
Call on draft-phillips-langtags-08 (tomorrow!)), but I'm
not the person to do so as I have only slight interest in
IDN (I'm one of those who considers associating a tag
with natural language and/or legally registered names to
be irrational).  One potential issue is that domain names
are case-insensitive, and whether lower-case accented
characters map to/compare with unaccented upper-case
letters may be a function of language (or culture, or
political fiat).
...
I would add that there is apparently some discussion of
wreaking similar havoc on local-parts, which appear in
message-identifiers and email mailbox identifiers (STD 11).
That too should be evaluated w.r.t. specification of
language and the proposed changes.

Bruce,

While I'm sympathetic to many of the points you have raised, the
IDN situation is not an issue except in a very narrow sense and
similar situation would apply to local-parts if we ever do
something there.  In the IDN case, the protocols are written in
terms of arbitrary Unicode strings and just about have to be --
there has never been a DNS restriction requiring that the labels
be names or words in a language.  The protocols apply some
mapping rules that reject a few characters (and hence the labels
that contain them) and change some characters into others, but
the net effect is still a set of standards written in terms of
strings, not languages.  There has been a good deal of concern
in the DNS community about the potential for deliberately or
accidentially misleading users about domain names and the
consequent opportunities for confusion or outright fraud.  Those
concerns have led to a good deal of work on restrictions about
what strings can be registered, imposing, e.g., rules that the
holder of one string may be the only permitted holder of a
related one and rules that prohibit mixing scripts within a
single label.  These types of rules, especially the latter, are
the "very narrow sense" mentioned above, but they have no impact
on the protocols themselves.  The registration rules actually
differ from zone to zone and can safely do so because, to the
user of the DNS, an unregistered name is an unregistered name
and the distinction as to whether a name is unregistered because
no one wanted it or because some subtle rule prohibited its
registration is not of importance.

The situation with local-parts will, most of us are convinced,
work out in much the same way.  There is a long history of
strings used in local-parts that are not "names", "words", or
otherwise bound to a particular language.  Worse, different
destination systems apply different internal syntax rules and
interpretations to local-part strings.  Protocols will need to
be designed to reflect that history and avoid unreasonable
restrictions.  At the same time, I would expect the
administrators of an given local system to impose restrictions
on what local-parts parts can be used for mailboxes there (just
as is often done today).   Those restrictions may, in many
cases, reflect assumptions about languages and/or scripts but,
since they are purely local conventions, there is no need for
external registration.

Returning to the DNS/IDN situation, ICANN has created a
recommendation for all TLDs, and a requirement on at least some
gTLDs, that languages not be mixed within a label and for
registration and use of tables similar to those recommended by
RFC 3743.  Those tables are identified by a combination of the
Domain name associated with the registering TLD registry and a
3066 code.  That system is not, IMO, working especially well and
the 3066 code model will, I think, have to be extended to deal
with some unusual situations.   But, interestingly,
draft-phillips... doesn't appear to solve that particular
problem: what is needed is a way to specify odd mixtures of
languages and/or scripts that may be appropriate to a particular
zone, and that means less specificity and more
linguistically-strange constructions, not more specificity and
structure.  

     john




_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf


<Prev in Thread] Current Thread [Next in Thread>