ietf
[Top] [All Lists]

RE: Will Language Wars Balkanize the Web?

2000-12-04 12:00:03
On Sun, 03 Dec 2000 13:17:45 EST, vint cerf <vcerf(_at_)MCI(_dot_)NET>  said:
to incorporate and refer to domain names. The IA4 alphabet 
includes essentially
just the letters A-Z, numbers 0-9 and the "-" (dash). This 
is the limit of what
is allowed in domain names today. 

The sad part is, of course, that RFC1035, section 3.1 
specifically says
that any octet value is legal.

The restrictions that Vint mentions are actually restrictions on the domain
name part of email addresses, as specified in RFC-821. The DNS system itself
does not has such restrictions; this allows for example RFC 2782 to specify
the use of the "illegal" character _ (underline) in some domain name parts.
The main restriction in the DNS itself is the comparison rule embedded in
the system, that says that domain names are case independent. Case
comparison is indeed specific to the alphabet code, and in fact is often
times language dependent. The matter is already muddy for European
languages. In a case independent comparison in French, e-acute matches the
accentless e; in German, u-umlaut could match the digraph "ue"; DNS servers
don't do such matches, but at least they do the binary comparison right when
an 8-bit alphabet is a superset of ASCII. But the matter indeeds gets more
complex when the characters are encoded on 16 bits, when either the top or
the bottom could be misinterpreted as a lower or upper case ascii letter,
resulting in incorrect matches. So, at a minimum, we need an IETF
specification on how to detect that a domain name part is using a non ascii
encoding, so that DNS servers don't get lost.

-- Christian Huitema