ietf
[Top] [All Lists]

Re: Last Call: language root file system

2005-08-27 20:38:34
Dear David,
I am afraid this debate leads to nowhere because you suppose implemented solutions instead of considering how to implement them. Saying "operating systems will start providing ... so that only an OS update ....".

Everyone would be happy if the DNS was supported by the OS (MS said they would support IDNA in Windows should the market shows it the proper thing to do ....). Louis Pouzin defined the mail concept in 64, Tom Van Wleck developed it in 65. The first spam was in 67 (ask Tom). Since then there are people saying they have the anti-spam solution...

However, you have a solution: Unicode. Why not? But you need to document it.

At 00:23 28/08/2005, David Hopwood wrote:
JFC (Jefsey) Morfin wrote:
At 18:11 27/08/2005, David Hopwood wrote:
JFC (Jefsey) Morfin wrote:

[...] The DNS root is updated around 60 times a year. It is likely that the langroot is currently similarly updated with new langtags.

No, that isn't likely at all.
Dear David,
your opposition is perfectly receivable. But it should be documented.

For the long-term, sustained rate of updates to the registry to be 60 a year,
there would have to be real-world changes in the status of countries or in
the classification of languages and scripts that occurred at the rate of 60
a year (i.e. every 6 days). And even in times of significant political
upheaval, that is simply implausible.

Please stop removing the responses I already gave. I documented that this rate, for a database meant to add all what is missing, is a very very low rate. This is not in denying reality that you build yourself a credibility for your proposition. You should document the number of yearly changes in the Unicode files: consult http://www.unicode.org/versions/ - there are 20 versions on-lines.

The order of magnitude is the same. I did not note the number of entries in the IANA file during the last months. This is something that I will certainly maintain if the registry stabilises.

Exactly; the registry has not stabilised. It will do, but until it does,
there is little point in arguing statistics on how frequently it is updated.

??? Nobody is arguing, are you? There is a problem which is to be assessed, documented and addressed. However a BCP the Draft does not document this (this was discussed, not addressed and decided as out of scope). The responsibility has been left with the IANA. The IANA is all of us. This is for the IESG to decide, and to bear the responsibility.

I documented that stabilisation means tens of thousands of entries. When (if) such a stabilisation occurs, the problem of size will be still more important.

The langtag resolution will be needed for every HTML, XML, email page being read.

Patent nonsense. In practice the list will be hardcoded into software that needs it, and will be updated when the software is updated.
Then? the langtag resolution is the translation of the langtag into a machine understandable information. It will happen every time a langtag is read, the same as domain name resolution is needed everytime an URL is called.

The langtags would already be encoded in a form that can be interpreted
directly by each application.

I do not understand what this may mean? The Draft is about that. What is discussed here is the update of each application.

You were trying to imply that repeatedly downloading this information would impose significant logistical costs:

# Even if the user cache their 12.000 to 600.000 k zip file when they boot,
# or accept an update every week or month, we are in the logic of an
# anti-virus update.

I try to imply nothing. I document that applications (many possible applications) have the need to access data from a big database. We need the simplest, most secure, least costly, most stable, most open to innovation, fastest and most efficient way to give that access. Because I am among those who will pay for it and who will be blocked if it does not work, I feel concerned as I see no credible solution.

I am not interested in your "no"s, but in your/IESG "how"s I could be happy with.

In fact there is unlikely to be any additional cost apart from that of
upgrading software using existing mechanisms.

Like updating mail servers to anti-spam solutions, ISPs to IPv6, IE to IDNA ?

This is perfectly sufficient. After all, font or character encoding support for new scripts and languages (e.g. support for Unicode version
updates) has to be handled in the same way.
I am afraid you confuse the process and the update of the necessary information. And you propose in part the solution I propose :-) .

If it is sufficient to upgrade software using existing mechanisms, then there is no problem that is not already solved.

OK. you imply the Unicode solution is your solution?

 Languages, scripts, countries, etc. are not domains.
The DNS root tend to be much more stable. What count is not the number of changes, but their frequency. - there is no difference between ccTLDs and country codes. We probably can say that there is one change a year. At least.

What happens if the change isn't immediately picked up by all software?
Not much. Only use of that particular country code is affected.

Thank you for the "not much" for the affected country code. Let say that some time we will have to switch from "uk" or "gb" to "en", with all the changes it would mean. Not much a problem if only England is affected ?

The same, no big deal if the DNS root is not kept updated? OS could update it in computers every now and then (ICANN has sometimes four months delay)? So, let suppress the root servers: I do not object to this, but I wish to know if this is your proposition?

I proposed to use the DNS to support that information. This was opposed. Do you think your Unicode-like solution is better?

[...]
Now, if there are updates, this means there are needs to use them, now - not in some years time.

And if they do, they will upgrade their software -- which is what they
have to do anyway to actually make use of any new localisations, scripts,
etc.

The problem is not with people upgrading. The problem is with the servers providing this upgrade. If all the current users upgraded their current langroot file once a year, over the year, in trying to avoid any peak, no error, no DoS, etc. This would represent today 400 K a second.

In reality this would represent at least be 4 Meg peaks. Up to IESG and to IANA to say they can take the load and the risk. Size increase, frequency increase, DoS risks, probably call for 400 Megs.

Again DNS would most probably dramatically distribute the load. CRCs (Common Reference Centers) I work on are to support this without problem, and add a lot added value. But it seems the opposition we have is here. The authors favor, like you, a Unicode oriented solution.

PS. The problem is: one way or another one billion users, with various systems and appliances must get a reasonably maintained related information which today weight 15 K and is going to grow to 600 K at some future date,

The subset of the information needed by any particular application will
typically be much less than 600K. If there is a real issue of database size,
operating systems will start providing shared libraries to look up this
information, so that only an OS update is needed (and similarly for the
Unicode data files, which are already significantly more than 600K).

This is like saying that the subset of DNS information a use will typically need is much less than the hundred of millions of FQDNs.

Do you mean that if there is a problem, there will be langserver system developed in a rush outside of any standard, so the standardizers are not to bother? And that, in any case, we can copy Unicode?

Then I would be interested in the Unicode solution? In the traffic data and user support of Unicode? Of the Unicode distribution system? Of the way people maintain their Unicode applications (I suppose there is a reason why they maintain 20 releases online)?

But I am afraid you confuse the usage of Unicode data and of language identification data. This is not a table, this is a network cross negotiation, calling for far more information. The currently discussed Draft on filtering/matching shows the versatility users should obtain...

with a change from every week to every day (IMHO much more as people start mastering and adapting a tool currently not much adapted to cross lingual exchanges). From a single source (in exclusive case) or from hundreds of specialised sources in an open approach. This should not be multiplied by all the languages that will progressively want to support langtags, but will multiply the need by two or three.For example an Ukrainian will want langtags in Ukrainian, in Latin and Cyrillic scripts [...]

You pick one of the very few languages that are written in more than
one script, and use that example to imply that the total number of
language-script combinations used in practice is 2 to 3 times the number
of languages. Please stop exaggerating.

??? This is an odd remark! When you develop a solution, you do not target supporting needs a minima. You support a maxima. And this example is not really a big one. Just think what can be the demand for an international multilingual directory....

The question is to know if the Unicode solution can be adapted to the IANA or to replace it ,if needed? if it scales or cost too much. Without making the Internet dependant of commercial solutions? Will you warranty this?

jfc


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf