ietf
[Top] [All Lists]

Re: Last Call: language root file size

2005-08-28 06:58:23
Last Call: http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-04.txt

I started documenting some of the problems resulting from the expected size of the language tag registry and the capacity of the langtag solution to fulfill the WG-ltru Charter. Here are two inputs from the author of the Draft above on the WG-ltru list, Doug Ewell:

- "I've already built a hypothetical RFC 3066ter registry. The changes alone add up to 35,700 lines, or more than 740 pages in RFC format. It might reopen the question of whether an I-D is the best vehicle for delivering this amount of information to IANA."

- "I still have significant concerns about the assumption that ISO 639-6 will be, or should be, automatically integrated into a language tagging scheme. [snip] Meanwhile, the claim that there are "over 20,000 languages" to be tagged is being used as an argument against the current RFC 3066bis effort and the plan to support 7,600 languages in RFC 3066ter."

I fully share the concerns of Doug Ewell. There is only a difference of timing. I rose questions they took as oppositions and he now discovers as problems. Would the WG-ltru have first analysed its charter we would have identified them a long ago. I list some in annex.

The Charter says: "The RFC 3066 standard for language tags has been widely adopted in various protocols and text formats, including HTML, XML, and CLDR, [... the first document] is also expected to provide mechanisms to support the evolution of the underlying ISO standards, in particular ISO 639-3, mechanisms to support variant registration and formal extensions, as well as allowing generative private use when necessary."

1. BTW the Charter speaks of "adopted _standard_" not of "generalised _practice_".

2. the documented upgrade enlarge the size from 80 K (11.5 K zipped) to 650 K (100 K zipped). This information, updates and additions MUST be available to each of the on-line application of the devices of billions of users. The Draft does not explain how.

2. One of the author has _legitimate_ concerns about the capacity of the proposition and the reasonability of the Charter expectation to support the ISO 639-6 evolution of the underlying ISO standards.

But he is wrong is in assuming that I use this as an argument against the current RFC 3066bis effort. To the contrary, I use it for a an argument to support the proposed Draft as default solution and support extensions and practical information distribution through other adapted solutions introduced by a singleton. The comment given by Debbie Garside, the author of ISO 639-6 shows there is no concern for our plans and developments using ISO 639-6 [except two months delay on our projected calendar], in line with ISO 11179 (what the WG-ltru and the author if ISO 639-3, Peter Constable, chose to disregard).

No problem in specifying this extension capacity through a specialised Draft. If this Draft is a complement, not a competition. But I would prefer to work on a sample ISO 639-6 code and get inputs from the WGIG multilingual forum under discussion. The three approaches (not even alluded by the Charter) address three different layers:
- practical, script oriented needs (ISO 639-1, 2, 3, ISO 15924)
- comprehensive multimedia, multimodal, multitechnologies and extension to information systems (ISO 639-6)
- lingual community networks (privateuse systems)

Years of work ahead, I would prefer being made within the IETF rather than against it.
jfc

Annex: a quick list of questions that need to be addressed.

- the nature and the size of the involved information
- the size and the frequency of its additions which is not documented. However we know that the I-D bases are just the support for additions.
- the availability of this information,addition and updates to the users
- the redundancies of this information between three different visions of the language tagging supported by
  - ISO 639-3: a bare list of 7450 languages calling for variant information
- ISO 639-6 : a list of 20.000 pre-formed language tag meant to deliver language information to ISO 11179 systems - the real user systems planned to be supported by the Draft "privateuse" solution.
  and their correlations (bridges, equivalence, conflict resolution, etc.)
- the additional related information such as "locale" files (quoted in the Charter as CLDR, the Unicode project to publish all the locales of all the OS directed by one of the authors of the Draft). This related information is actually without limit as it may relates with every other information system (ISO 11179). However efforts like ISO 11179 have not yet addressed two major points: - networking what represents a huge area of research and applications (interoperation of an ISO complete system per... user avatar) - multilingualism, this means that everything supported in ASCII English must be able to be supported in every language - the change in nature of languages, texts, scripts by multilingual, multitechnology, multilateral computer supported relations
- the multimodal (script, oral, sign, etc.) nature of the internet exchanges
- the architectural evolution to support the "brain to brain" interintelligibility (languages) and the community interculturation (vernacularisation) layers.
- the privacy violation and the intelligence leaks security issues this rises.
- the purpose and architecture of the applications and services using language identifications
- the relations to cultures and content, UNESCO, GAC
- the identification, information and adhesion of lingual and culture experts and publics
- the correlation with MPEG, MINC, ITU and other SDOs
- the governance and the distribution of the language information system


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf