ietf
[Top] [All Lists]

Diacritical application in the DNS

2000-12-05 17:23:01
Greetings,

Martin Duerst duerst(_at_)w3(_dot_)org said:
It might be usable as a poor man's ASCII equivalent, 
but I strongly doubt that anybody will want to have
it on the Latin side of their name card.


Patrik paf(_at_)cisco(_dot_)com said:
I would, because I know that people in many parts of the world don't 
know how to enter "sömos" on their keyboard, and if I register the 
domain "snömos.se", I really want people to be able to get to

...I know that people in many parts of the world don't 
know how to enter "sömos" on their keyboard, and if I register the 
domain "snömos.se", I really want people to be able to get to
   http://www.snömos.se
So, if I think it is perfectly all right to have
   http://www.bq--abzw55tnn5zq.se

- - - - - - - - - - - - - - - - - -

Dan Kolis dank(_at_)hq(_dot_)lindsayelec(_dot_)com says:
Now we are getting down to the nuts and bolts of the feeling something's not
too great in this basket of goodies.

   http://www.snömos.se

Conceptually and maybe in some jurisdictions obligates:

   http://www.snomos.se

And the obverse is true. Dealing with even a rudimentary understanding of
human factors implies these two have a mapping to each other.


So:
   http://www.snömos.se <============> http://www.snomos.se

   Entity one                          Entity two

Where the symbol <============> means "common destiny". This is reversible
in that one existing creates issues in the real world for the other. In some
purely theoretical space, there is no problem at all. This is repaired by:

   http://www.bq--abzw55tnn5zq.se 

Being a unique mapping of Entity one.

The suggestion Patrik paf(_at_)cisco(_dot_)com made to have:
   http://www.bq--abzw55tnn5zq.se

Appear as a pseudonym of Entity one human readable printed correspondence
defeats the purpose of having a DNS. A dotted IP address is easier to use
and less error prone than a completely non-readable hex dump like entry.

123.34.56.67 has got to be easier to enter than www.bq--abzw55tnn5zq.se

My question to Patrik is, (Q1) when your non diacritical capable (potential)
user enters:
   http://www.snomos.se

and hopes for the best, is it ok if they get your site? 

(Q2) Is it ok if the more savvy user entering this, if they get the same site?
  http://www.snömos.se

(Q3) Are you will to pay for two domain names to make this happen?


The major reason ICANN jumped on internationalizing the DNS is political
correctness, not convenience to anyone, include those who's sole or favored
language is represented poorly in the existing system. Now, the suggestion
has been posed that this is not an IETF or "Internet intelligentsia" issue,
and ICANN or whoever can fight the trench warfare; e.g.: battle
cybersquatters hoping for entry errors, etc to make it work. Well, some
things can't be legislated into functionality, they can just be made to work
badly in a different way. For example, the Virginia legislature decided,
"for the purposes of Commerce", decided 175 years ago to Fix Pi at 3.1

This did not change the relationship of circumference to diameter.

Working with the <============> mapping can be achieved by many methods:

1) Blame non-technocrats for being computer illiterate, and ignore their
complaints.

2) Blame non-linguists for being language illiterate, for not understanding
the idiosyncrasies of 2500 languages.

3) Make certain things neo-illegal; (UDRP says 'no') to some domain names
because other like it exist. Ex. diacritical marks aside, they are 'the same'.

4) Use tort type 'law' to create liability for whoever is Nth (second,
third, etc) creating a misunderstanding.

5) Create DNS resolver software, which encodes human misunderstandings and
returns IP's based on some hierarchy of likeliness when an Entity (we are
already contaminating what constitutes an URN, URL) is not found.

6) Presenting redundancies to users; (as in Patrik's workaround). Give them
more to poke with, hoping they gett what they want. via some trial and error.

--------------

I may have missed a coping mechanism above, but its easy to see a problem
with each of those.

Since ICANN is such a new agency, the exuberance to "do the right thing" is
powerful, and the community should understand the good intentions behind the
proclamation. I have thought about this and have a suggested way to proceed
which has a pretty slim chance of being applied, (due mostly to timing, the
thinking here is probably frozen). If this was suggested early on, it would
seem the obvious way to proceed instead of trouble. Anyway, this is it:

Dan1) Carry all diacritical marks in non-ideographic languages and make a
simple 1:1 mapping to ignore them for comparison purposes. RACE remapping is
not used. RR entries can be in any human readable language as well. So for
example: 

This is an existing Icelandic ice cream vendor:

http://www.kjoris.is/

Now I risk discomfort for the anti-social act of attaching a 4K gif. Its
tiny, sorry to inconvenience you if either you hate attachments via email
reflectors, or if it's blocked. It's a slightly stylized logo for these ice
cream guys in Icelandic. This is my pieced together version of their logo
including diacritical marks unfamiliar to me. The closest I can get with a
French Canadian setup under a Microsoft OS is:

Kjörís

Fine. as a matter of principle I try it now:

http://www.Kjörís.is/
And just to be sure:
http://Kjörís.is/

Both don't find this in the DNS, I get:
"The page you are looking for is currently unavailable. The Web site might
be experiencing technical difficulties, or you may need to adjust your
browser settings." as the human readable for this.

I think that this should work. Do you think the marketing manager agrees? I
will *more* than bet they do; (I will Email them and ask). Did you notice
the period above the lower case i is an accent? Ok.

Since I was looking at a stylized logo, I wasn't sure how to code the
diacritical mark over the i. If you decide to leap upon me as a total moron;
(as per reason 2 above), I say; "Apparently your Icelandic is better than
mine", next time we meet let's conduct 100% of the session in ASL [AKA sign
language] with my apologies"). Point being; we all know bits of other
languages; but no one knows all of all the other languages!

While seeking the program at Verisign to remap the above, I thought the URL
for the ICANN announcement might be useful to revisit. It is:

http://www.icann.org/announcements/icann-pr03nov00.htm

The conversion tool lives at:

http://mct.verisign-grs.com/


So entering the existing URL gets us:
   Input String Utf-8 www.kjoris.is 
   Prepared String Utf-8 www.kjoris.is 
   Registration String RACE www.kjoris.is 


and the carefully guessed at one:
   Input String Utf-8 www.Kjörís.is 
   Prepared String Utf-8 www.kjörís.is 
   Registration String RACE www.bq--abvwv5ts5vzq.is 


So, depending on how the protocol is implemented, I could get:

"The requested URL could not be retrieved
While trying to retrieve the URL: http://www.bq--abvwv5ts5vzq.is
The following error was encountered: 
Invalid URL "

Hmmm, maybe I should just ask for help at:

postmaster(_at_)bq--abvwv5ts5vzq(_dot_)is

via email! {Sorry}

So further;

Dan2) Languages substantially distant from the Utf-8 "prehistory before the
21st century" use the RACE mapping.

Revisiting the ICANN Announcement at:
http://www.icann.org/announcements/icann-pr03nov00.htm

"For several months, a working group of the Internet Engineering Task Force
(IETF) has been working to develop a standard specifying the requirements
for internationalized access to domain names. This standard, when it is
completed, will extend the operation of the Internet's Domain Name System
(DNS) to character sets other than ASCII (the only character set currently
supported) such as Arabic, Chinese, Japanese, Korean, Portuguese, the
Scandinavian languages, and Spanish.

Several experimental testbeds are in operation or have been announced. These
testbeds are testing a variety of approaches under consideration by the IETF
working group, but are part of a common commitment to converge on whatever
standard solution is ultimately adopted.

These developments, which could bring very significant changes to the way
the DNS can be used, have attracted remarkably little public notice."

So two things occur to me; (Last thing first). "Remarkably little notice",
is an understatement to say the least. Everyone with whom I discuss Internet
knows about adding new TLD domain names. Not one seems to understand this
initiative. Most respond with: Huh? Why? What's this going to cost domain
holders?

Looking at this short list of: Arabic, Chinese, Japanese, Korean,
Portuguese, and the Scandinavian languages, Spanish.

I guess the "Scandinavian languages" didn't warrant breaking any of them out
by name, so let's assume this is all phased in over time. Leaving off
languages with UTF-8 inclusion that still don't let diacritical marks work
is interesting. There is actually low hanging fruit to yield on this
initiative before conquering the hard stuff. Anyway:

Hangul=Korean

Partial inclusion list where diacritical marks are carried; (e.g., Unicode,
whatever) and boiled down to a best efforts subset:

English, Portuguese, Italian, French, German, (these are examples, bear with
me).

So we could basically write these reductionist rules in a day or two. Since
they are known imperfect, they can only be compromises, anyway. Examples:

? := oe
Æ := AE
Ç := C
ß := B
Ý := Y
ý := y

Now there are hard choices in all of this. For example, the handling of case
in Yugoslavian for "y", "Y", "j", "J", is so dissimilar to other languages
you wonder how long it takes the kids to get through second grade over
there. So, maybe some simple remapping will bother them. But since all the
visual characteristics are completely, totally preserved in my scheme, they
only have to care if an URL doesn't resolve. They get a choice between
wondering what happened to:

http://januara.org

Thinking it should be smart enough to try:

http://Yanuara.org

But the alternative is an error pointing to a RACE encoded DNS bounce:

{Actually, the Verisign program returns: "internal error for this!"}
?anuara = internal error back to my example.....

bq--ad6wc3tvmfzgc.org not found in DNS


-----------

Partial inclusion list for encoding in RACE form:
   Chinese, Japanese, Hangul, Arabic, etc


The tears of the Comp Sci guy:
Now the computer scientist in too many of us howls! Arbitrary! This cannot
be! No, actually it isn't and it, paradoxically, does not require knowing,
blessing, believing or condemning and particular human printed language. In
the total absence of an encoding requiring RACE, there are only two
possibilities:

1) its is UTF-8 without modification.
2) The equivalence table as per above applies.

So if I decide to have a domain name: 
   dinosaur dunce_hat Tilden Red_spot Double_umlaut_birdseed Omega_psi (dot
COM, I mean, we have to make a living here you know). It lands in this big
RACE pot and boils up a RACE string and there you go. 

Sorry again this is so long and even worse, includes an attachment. I also
want to say I appreciate the efforts of those designing all this stuff and
there is a natural repugnance to having someone outside butt in I can just
about feel. However, you're designing something that's supposed to be used
by a lot of people who has a vested interest in whether it works, want it to
just work, and no interest in how it works. I think its possible to make it
look right AND work right.

Someone more in daily contact with managing the DNS can perhaps help me on
this. Only registrars handing a remapping; (not just RACE) who are the SOA
would have to change software... is that right? If that's not the case my
suggestion is in the best interests of end users, but hard to do. 

Complex, yet interesting.

Regards,
Dan Kolis








Attachment: KJALL.GIF
Description: GIF image

<Prev in Thread] Current Thread [Next in Thread>