ietf
[Top] [All Lists]

Re: Update of RFC 2606 based on the recent ICANN changes?

2008-07-07 08:55:15
Is "сом" identical to "com"? (the first of these is U+0441
U+043E U+043C)

The current principle is that it should be be a "confusing string",
which is vague enough to cover the case above (but perhaps not able to
cover .co)

"Similarity" can be defined and tested, by setting thresholds and the like, but "confusing" refers to a state of mind - something is "confusing" if the people who are likely to encounter it consider it to be confusing. There's no way to objectively define or test for "confusing" similarity without reference to how actual people respond to a particular string. That means either mining data collected from circumstances in which people have mistaken one string for another (perhaps a history of Google searches), or consulting a panel of real people whenever it is necessary to decide whether or not two strings are "confusingly" similar.

(b) be identical to a Reserved Name;

(c) consist of a single character;

I've heard it argued repeatedly that this is an unreasonable
rule for ideographic characters.   I don't have an opinion, but
hope that ICANN has considered that case in full details.

This is where we dive into a discussion what is a "character". In
ideographic based language, there isnt a concept of a "word".

For example, Chinese, Japanese and Korean are actually "phonetics
language", and that ideograph characters are used to express the
phonetics. A "word" or more accurately "morphemes" can be express in a
single or more ideographs. A single latin character is unlikely to be
useful by itself (except of a and i) but thats not the case in CJK.

If the condition is that "no single ASCII character", I may be neutral
about it (since a single ideograph would never translate to a single
ASCII character in the zonefile, due to the xn-- prefix) but if the
"character" is defined more broadly to cover "U-label" character, then
I would have strong objections.

At the moment, the condition is "no single Unicode code point." To the extent that a single CJK ideograph can be expressed using a single Unicode code point, this would represent the situation to which you say you would object. I will dig through my notes to find out why the "single character" condition was adopted -

- Lyman
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf
<Prev in Thread] Current Thread [Next in Thread>