Re: a way toward homograph resolution ?

Hi -

Let it suffice for me to say that I believe the gentleman is mistaken.
I do not intend to waste additional bandwidth on this thread.
Those interested in ltru and its work will find our charter at
http://www.ietf.org/html.charters/ltru-charter.html and our archives at
http://www.ietf.org/mail-archive/web/ltru/index.html

Randy, ltru co-chair

----- Original Message -----

From: "JFC (Jefsey) Morfin" <jefsey(_at_)jefsey(_dot_)com>
To: <ietf(_at_)ietf(_dot_)org>
Cc: <idn(_at_)ops(_dot_)ietf(_dot_)org>
Sent: Tuesday, May 10, 2005 9:08 PM
Subject: a way toward homograph resolution ? (was "improving WG operation")

On 04:43 11/05/2005, Randy Presuhn said:

From: "JFC (Jefsey) Morfin" <jefsey(_at_)jefsey(_dot_)com>

To: "Hallam-Baker, Phillip" <pbaker(_at_)verisign(_dot_)com>
Cc: <ietf(_at_)ietf(_dot_)org>
Sent: Tuesday, May 10, 2005 5:29 PM
Subject: RE: improving WG operation

...

They do not not only delete. I suggest you just come to the WG-ltru where
they have decided to document RFC 2277 charsets into RFC 3066 langtags. So
you can enjoy charset conflicts, something you never though about, I
presume. You cannot stop progress.

...

I guess Jefsey is upset because the WG rejected his proposal
to expand our scope to include charsets.  The ltru WG is most
emphatically *not* confusing charsets with language tags.


I am not upset :-). To the countrary I find extremely interesting that some
people were able to rename charsets "scripts" in order to insert charsets
into languages descriptions while claiming they dont (cf. above). Obviously
they are unhappy when I expose the trick. Anyway the result is great fun:
people will be prevented from accessing a page they know to read, if they
do not know the language.


This cacologic however might be a good way to solve the IDN homograph issue
and the phishing problem.

If we revert from those famous "scripts" to what they are, i.e. unicode
partitions, hence stable and well documented charsets
(http://www.unicode.org/Public/4.1.0/ucd/Scripts.txt) , using them browsers
can expose the homographs not related to the page charset in IDNs, and kill
the risks of phishing.

This only calls for the browsers to extract the charset, I mean the script
name from the langtag, call this file, read the list of codes points in the
charset/associated to the script, and display the URL accordingly,
indicating the characters which are no part of the script/charset. This
relieves the ccTLD/TLD Manager from responsibilities he cannot fulfil at
3+level.

There are howver still (minor) points to address:
- there are some minor disparities between the "script" name in the
langtag, and the script name in the script.txt file should be reduced over
time. I suppose that if this is a major issue, there will be help.
- the script.txt file is currently supported on the Unicode site. Even in
caching it (92 K) it will be called everytime people will start their
browser. This may therefore represent several billions of access a day.
- the WG-ltru only realy wants to address XML issues, related to old XML
libraries. Some coordination with other WGs or interests could be fruitful.
They plan the language tags registry to extend to scripts and to register
them. I suppose other WGs could benefit from this (all those involved in a
way or another with internationalisation and languages).

jfc






_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf





_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf