ietf
[Top] [All Lists]

Last Call: language root file system

2005-08-27 04:34:03
http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-04.txt

The proposed RFC 3066 bis Draft under current review actually creates a "LTS" (language tag system) having similarities with the old Host.txt system.

It is made of:
- a standard track description calling for tag resolution libraries like http://www-306.ibm.com/software/globalization/icu/index.jsp .
- a registry, managed by ietf-languages(_at_)alvestrand(_dot_)no
- a root file: its initial version is given by the Draft above (under LC) to be hosted by the IANA servers.

The terse Draft version of the "langroot" is of 80,430 chars (for information the current size of the DNS root is 63,858 today). Zipped it is 11.413 chars while the DNS root is 17.320. The DNS root is updated around 60 times a year. It is likely that the langroot is currently similarly updated with new langtags. These files are therefore today of equivalent magnitude.

However, there will be a sharp increase next year when all the ISO 639-3 language codes are added, if ISO accepts it, as its authors thinks, by the end of the year. We will have an increase from 500 languages to 7450. When ISO 639-6 is accepted further on, there will be 20.000 languages subtags more. Then we may have additional regional tags (for example ISO 3166-2, E.164, X.121 or UN-zones, etc.) which may multiply the size by two or three.

The need for the users to obtain an updated langroot information is equivalent to the DNS root (see below) with the following differences: - there is no intermediary service like ISP nameservers: the users will directly call the langroot servers (hence my comparison with host.txt)
- there is to date no cache solution, nor TTL considered yet.
- the number of calls to the registry data (not necessarily to the langroot server) by the users is superior to the DNS. The DNS supports less than 50% of the Internet calls. The langtag resolution will be needed for every HTML, XML, email page being read.

There is no possibility to know how developers will address the langtag resolution need. But the foreseen evolution of the IANA registry and the needs of the market plead in favor of an XML or an ASN.1 solution. The clearing-house related architecture is not clear at this stage, as the proposed Draft does not discuss this point. This is left to the IANA (hence my interest to know about the exact timing). The draught-back is that if a versatile and fast solution is available programers will tend to use it real time. This means a system comparable to the DNS root, with a root file which may be 40 to 50 times larger, called upon 100 to 500 times more often (I know that 97.5% of the DNS root calls are illegitimate, that the langroot will not have the same historic and will take time to grow, what will certainly drastically initially reduce that last figure; but on the long range this figure stands).

The problems are the analysis of this system (its real need, its alternative, its architecture), of its cost. Today the IANA servers are not used in real time dynamic operations. Even if the user cache their 12.000 to 600.000 k zip file when they boot, or accept an update every week or month, we are in the logic of an anti-virus update. The IANA system can certainly support it in using an AKAMAI like solution, but we are entering a commercial approach. Is that what we want?

The proposition I made is to use IETF Drafts. This means that the current Draft would be made a permanent solution, with a Draft updated every month (this would probably provide a shorter delay than the IANA and stay under the full control of a dedicated IETF WG). The Internet community is used to rely on the RFC Editor, no one accesses it real time, and the new ISOC/IASA organisation provides a reliable and trusted structure. It would mean that updates would be provided once a month, possibly in a diff way or rsync to ISP and from ISP to users.

This also means that the "Draft as a default not as an exclusive" solution I propose, will permit a simple integration of specialised schemes: in addition or instead of using the Monthly WG Draft data, the libraries will use the data from the scheme they want to use. This approach is no cost, off the shelves, proven, immediate, and no conflict as totally open and distributed.

NB: a comment could be that the IANA could provide a simple ASCII jar file, some other service could distribute (like in my proposition). This is what we can actually expect. If my "Draft as a default not as an exclusive" proposition was not accepted, it would mean that this other service would be de facto market exclusive of the exclusive file. It would then probably be financed in selling related services and informations: this would build a commercial core for the Multilingual Internet.

jfc


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf