Is "сом" identical to "com"? (the first of these is U+0441
The current principle is that it should be be a "confusing string",
which is vague enough to cover the case above (but perhaps not able to
"Similarity" can be defined and tested, by setting thresholds and the
like, but "confusing" refers to a state of mind - something is
"confusing" if the people who are likely to encounter it consider it
to be confusing. There's no way to objectively define or test for
"confusing" similarity without reference to how actual people respond
to a particular string. That means either mining data collected from
circumstances in which people have mistaken one string for another
(perhaps a history of Google searches), or consulting a panel of real
people whenever it is necessary to decide whether or not two strings
are "confusingly" similar.
(b) be identical to a Reserved Name;
(c) consist of a single character;
I've heard it argued repeatedly that this is an unreasonable
rule for ideographic characters. I don't have an opinion, but
hope that ICANN has considered that case in full details.
This is where we dive into a discussion what is a "character". In
ideographic based language, there isnt a concept of a "word".
For example, Chinese, Japanese and Korean are actually "phonetics
language", and that ideograph characters are used to express the
phonetics. A "word" or more accurately "morphemes" can be express in a
single or more ideographs. A single latin character is unlikely to be
useful by itself (except of a and i) but thats not the case in CJK.
If the condition is that "no single ASCII character", I may be neutral
about it (since a single ideograph would never translate to a single
ASCII character in the zonefile, due to the xn-- prefix) but if the
"character" is defined more broadly to cover "U-label" character, then
I would have strong objections.
At the moment, the condition is "no single Unicode code point." To
the extent that a single CJK ideograph can be expressed using a
single Unicode code point, this would represent the situation to
which you say you would object. I will dig through my notes to find
out why the "single character" condition was adopted -
Ietf mailing list