Re: IDN and language



--On Tuesday, 04 January, 2005 12:52 -0500 John Cowan
<jcowan(_at_)reutershealth(_dot_)com> wrote:

John C Klensin scripsit:

Returning to the DNS/IDN situation, ICANN has created a
recommendation for all TLDs, and a requirement on at least
some gTLDs, that languages not be mixed within a label and for
registration and use of tables similar to those recommended by
RFC 3743.


This regulation is going to be completely unenforceable, since
with a few exceptions (hexagonal French), languages do not
have bright-line rules saying what words they do and do not
contain.  Are we to be in the position of saying that
eigenvector.com may be registered (and is) because the word
appears in dictionaries, whereas eigenevent.com is ruled out
because it "mixes" English and German?


John, I am sure that ICANN would welcome your participation as
the various rules/ guidelines evolve -- those rules are not an
IETF problem, even though changes to the standard that is used
to label them might be.  One of the things their processes have
in common with the IETF is that they prefer that people actually
try to read and understand documents before attacking them, but
I suppose there are always exceptions.  In particular, the
recommendations of RFC 3743 are about tables of characters, not
dictionary lookup.   If, however, a domain decided to adopt a
canonical dictionary and lookup in it as a registration
criterion, that rule would be perfectly enforceable.  I'd
recommend against it for many reasons, but this would be more or
less up to them.

Forbidding the mixing of scripts is another matter, although
in fact some languages are written using more than one
(Unicode) script.


Whether those languages are a problem or not in the DNS context
depends on whether one wishes to permit a single label to use
both (or all three in at least a few cases I know of) scripts.
Again a per-registry decision and again perfectly enforceable
either way.  Other issues occur if the writing order of
characters in a language obeys specific rules and one chooses to
enforce them (a potential issue with, e.g., Hangul, although,
again, the choice of whether or not to try to enforce is up to
the registry).  But one of the notational problems with using
3066 would be a rule that one can have a label that contains the
characters of a given language written in, e.g., either a
modified Arabic script or a modified Cyrillic one but not in a
modified Roman ("Latin") one.  Another issue arises when one
wants to permit a character collection that includes the
characters from a given script that are used by two separate
languages -- not all of the characters of that script, but
exactly those characters that fall into the union of the
characters from the script used by the relevant languages.  It
is not clear that the current proposal is much better than 3066
for handling those cases, but I wonder if anyone has carefully
evaluated whether it would make things worse.

      john



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf