ietf
[Top] [All Lists]

Re: Last Call: 'Tags for Identifying Languages' to BCP

2005-08-29 09:28:40
Dear all,
at this stage I think it is clear that the langtags issue represents a strong opposition between two visions of the Multilingual Internet. These visions for the worse or the better are embodied by Peter Constable's friends and me.


There is an affinity group gathered by circumstances or by talent to support Peter's approach. Its kernel happens to be formed by English mother-tongue people employed by large corporations or interests (from history it seems it formed in the course of international meetings). A few Members are included by personal dedication or as consultant. There are no academic searcher, no publicly funded contributing project, no cultural organisation sponsoring. The Members of this affinity group share a comon culture. It is based upon different levels of technical involvement of the structures and individuals involved. There is no R&D involved in the network area which is not sponsored by commercial interests, with the con and pro meaning of RFC 3869. In that sense it can be said it is an US industry lead group. This is at least the way non-US interest, organisations, Government officials I discussed with identify them with no exception. True or not, this is the perception. It is to be related to the definition of an IETF affinity group be RFC 3774.

This group proposes a tagging of all the languages of the world, it perceives as a commondity (a well known trait of the English mother tongue people who share their own language with other people round the world). This way certainly suits e-commerce and basic interoperability and library classification of foreign books. The idea is that a standard and a central registry will constrain the world to follow a common useful rule, if it cannot continue using ASCII English. This is named "internationalisation". This unliteral standardisation is seen as the only warranty of stability and of unicity of the network. Being unique for the entire world this tagging must be simple and based upon simple information. This information is made of three elements the commerce needs for practical reasons: the written language, the script being used, the applying law.

This vision A addresses specific urgent needs of the printing and libraries industries to reduce costs to face the competition of other media and the printing capacity of every user (a problem less documented but as important as the Music industry'sproblem), with a larger financial turn-over. World concentrations and specialisations can be expected from a unique normative system. With all the reluctances one can be expected and the strategy one may imagine).


There is a tissue of relations I weaved among people engaged in network research, operations management, cultural life, government administration, international entities, lingual oriented interests and activities, and local industry, from various parts of the world, in particular through an Internet test-bed named dot-root (responding to the ICANN ICP-3 call), a long involevement in @large and ccTLDs, and from an national internet community and governement think tank I started one year ago and which develops unexpectedly. The strength of this relational group is that no money is engaged, what warranties its independance. But this is also its weakness as it leaves it no other alternative than to rely on voluntaries to represent it - often only one when the task is as demanding as this one; or to call on the personal involvement of concerned people, with the risk of overwhelming the Internet standard process by scores of irritate new commers. The common culture of this group is common sense support towards a user-centric multilingual architecture and strong sustainable innovation

This group sees no need to tag the languages but the need to document relations, which - among other things - use languages, but also many other parameters. It thinks that every human being, machine and service is specific and different from other, and that surety, security, stability and innovation capacity is based upon the best seamless support of these differences for a strong unity of the network. It experimented that the computing generalisation and a pervasive networking support a realistic, commercially rewarding and humanly exciting set of possibilities. This concerns relations, culture, economy, social, political development everyone, every economy, every country may share in, on an equal opportunity basis. It also sees a global convergence of R&D, civil society, economy and political spheres in that direction (for example at WSIS, but also at IETF) expressed in various directions, one being the information conceptual networking (ISO 11179 R&D) and another a fluid refencing system (URI tags) which give new possibilities; specially when added to physical and services networking.

This vision B calls for an open description system/language of languages, and of many other relational parameters. Obviously it is still in infancy as everything started in the early 80s has been delayed by the furthor OSI and then Internet vision, hardware and bandwidth limitations and costs. It is only resuming now.

The vision A has difficulty (and lack of competence Peter helped documenting yesterday) to understand vision B. And as usal in that cases it fights the messenger. No big deal: the messenger is used to it.

Vision B has no problem in accepting vision A as a "default" for those wanting it. However vision A is centralised and vision B is distributed. Xo, vision A thinks it needs to be unique to exist and fullfil its purpose. This is why Vision B proposed several things:

- to define a Vision A exclusive area of application. This was made from the second Last Call in proposing the authors to add wording telling that the area of application was the areas already covered by RFC 3066 and documented further on. - to protect Vision A from confusion. This was made in pushing the authors into a very strict ABNF avoiding tag-creeps. - there may be other propositions to sudy. This is however not easy to uncover as Vision A has difficulty with the architectural evolutions (network, content, relational elements) all this technically implies.

As I explained, there are three scenarii:

1. Vision A is denied by the IESG. Progressively vision B imposes itself through new RFCs or from a grassroots (international) process. The current basic needs are not properly addressed. Credibility of the IETF is engaged like in spam, IDNA, etc. This is delaying.

2. Vision B is denied by the IESG. But vision B is already accepted through the URI-tags RFC. It will develop in opposition to Vision A. This will cost money and delays to everyone, Multilingual Internet will switch outside of the IETF or balkanise.

3. Vision B is included in Vision A as a community private use. This scheme is simple to understand and to include in the RFC 3066 Bis document in two lines. It does not break any of its principles.

- the document is unchanged and addresses the general need, whatever it may be. - "x-" is unchanged. Its role is to support private use schemes, within private spaces. - "0-" is added from the reserved singleton pool. Its role is to support community private use schemes. This means, when a user community wants to document languages their own way. The need is to support in a non conflicting ways two informations:
       - the community scheme identification
       - the identification within that scheme.

I think this respect all the requirements of Vision A and permits a full developement of Vision B. There are two possibilities to support the "0-" space: either to develop a new system or to use an existing system.

I have no particular opinion except that the solution MUST be decentralised (community centralised). I started thinking we had to develop a new one, waiting for tge review of the WG-ltru charter both to make sure the proposition would fully respect Vision A and to learn Vision B points we would have overlooked (there probably are many). This created problems to the WG wich only wanted to block Vision B it still does not uinderstand or opposes.

Then we found the not yet numbered URI-tag RFC. It seems to address all the needs, but more than the needs, except the multilingualisation. My intent is therefore to document an IRI-tag along the URI-tag lines when this debate has stabilised and the URI-tag RFC has been published. I have no problem working on it within the WH-ltru.

What next? The Vision A alone is harmfull to all. If it was accepted it would be appealed. To IETF Chair for common architectural common sense. To IESG for lack of compatibility with the Charter and other RFCs. To IAB if necessary to obtain guidance on the implementation of the Multilingual Internet. Then appeals would continue in the outside world. The target is not to oppose the Vision A. It is to the contrary to make sure it is viable. As the only solution permitted, it will NOT survive because it is not able to resist all what one can expect people will do with it out of control. We had a very similar case with IDNA. The only response to hommograph phishing was "we discussed it"....

I will document a few of these points in responding last Peter's mail.

At 14:11 29/08/2005, Peter Constable wrote:
> From: Bruce Lilly <blilly(_at_)erols(_dot_)com>
> > This
> > is all what this proposition is about. This proposition is to give
> > _one_shot_ in a _standardised_ way the language, the script and the
> > country.
>
> This was discussed during Last Call of the previous non-IETF
(individual
> submission) attempt.  IIRC David Singer brought up several examples of
> other pieces of information (e.g. legal/copyright variations) that
could
> also be negotiated and which might affect the presentation of content
(or
> choice among alternative content).  Lumping all of these separate
items
> into
> one tag is a poor design as it impedes negotiation and tends toward
> lengthy
> tags which are incompatible with fixed-length mechanisms such as MIME
> encoded-words.

I agree that it would be poor design to incorporate other pieces of
information such as legal/copyright variations into language tags, but
as such pieces of information are not supported by the draft, this
appears to be irrelevant.

This is inexact. There is no problem in having the Draft compliant tag:

fr-Latn-fr-gayssot

to indicate a French language text fully respecting the "Loi Gayssot", the anti-racist law used against Yahoo. There is no warranty that an ISP or the French law does not filter out pages from suspected sites not wearing that tag, transfering Host legal responsibilities to the Author.

The problem in believing that one can rule the world is that the world may not accept to be ruled.

We should rather focus on whether it is good design to incorporate
information related to linguistic and written-form attributes, as
supported in the draft, into a single tag. The consensus of the LTRU
working group is that it is.

Let phrase it a more exact way: the affinity group which formed the WG has been gathered around that idea.

1. basic written mode attributes should not be specific in the description of a language ... while in addition most of them are oral 2. in what manner the country code is related to a specific information? Nowhere in the Draft this attribute is documented: is it the location where the text has been written, the location of the lingual community of the author, or of the lingual community of the reader ??? Where is that location definition documented so both side of the relation can understand each other when negociating?

 For instance, the use of separate tags for
language and script were considered and rejected

this has not been considered and rejected. This was a predefined faith and every question on this has been defeated.

The problem is that it is meaningless and conflicting with the charset!!!
Until you associate a "script" with a charset, a script has no meaning ....

I asked the simple question: "does fr-Latn-FR means that Latn permits me to properly write French?" To know that, I need to know what are the characters associated to "Latn". No response. Same question on the Unicode list. Non-French mother tongue members said "yes" (but no one was able to demonstrate it). French mother tongue experts said "no" and explained that Unicode lacks a particular space needed to properly type typical French sentences an one accentuated character. This was then disputed. My problem as a user, as a network standardiser is not to be concerned by these details. I need certitudes and warranties the Draft does not provide.

on the basis that the two are not entirely orthogonal. Clear examples of this was considered:
while the intent of

Accept-Language: ar, az-Cyrl, ru

is clear, the intent of

Accept-Language: ar, az, ru
Accept-Script: Cyrl

or of

Accept-Language: ar, az, ru
Accept-Script: Arab, Cyrl

is not clear, nor is it obvious how rules could be specified that would
make the intent clear, or that would permit expressing the preferences
reflected in the first instance.

This kind of example is absurd. There is no more information and more confusion with the proposed system if a page or a part of a document is also assigne different conflicting langtags ...

> Tagging identifies characteristics of a particular piece of content.
For
> that purpose alone, it makes little difference (other than regarding
the
> aforementioned compatibility issues with existing IETF mechanisms)
whether
> the characteristics are lumped or separate.

On the contrary, it makes little difference only if the characteristics
in question are completely orthogonal. As pointed out above, the
characteristics of linguistic variety and written form are not
orthogonal, particularly when it comes to expressing user preferences,
and that it *does* make a difference if they are split into separate
metadata attributes or they are lumped together into a single metadata
attribute.

Explain.

I will go your way however you have not defined what is a script. The author is a Rusian, siting in NY and writing a page in Urkainian and wanting the texts to be repeated in Latn and Cyrl scripts, so everyone there is able to read it. A very common proposition.

Please precisely document the langtags. And show what is not orthogonal in them.

> While that may be used to infer something about the content
> provider, such inferences may be unreliable...

Quite so. This point was discussed in the WG.

The question is to know if the solution is acceptable. This LC is the LC of the document, not the of the WG or mine;

> Negotiation of separate characteristics is much
> simpler than that of a combined conflation of characteristics; each
> characteristic can be assigned separate preference values, and
irrelevant
> characteristics (e.g. script w.r.t. spoken language) can be easily
ignored.

Negotiation of separate attributes involving inter-related
characteristics is *not* simpler, as pointed out above. The draft fully
allows for irrelevant characteristics (e.g. script wrt audio content) to
be ignored. Again, what has been provided in the draft is in accordance
with the charter of the WG.

Charter speaks of languages. You made clear the Draft was language and not written language oriented. I am glad to learn that the mode is an irrelevant characteristic.

Most of the languages are oral. Their rendering in a written form is therefore a important information ...

> As negotiation and related issues represent a critical technical issue
for
> the design of language tags (viz. keeping separate characteristics out
of
> *language* tags), it is essential that such negotiation issues be
> considered
> carefully before specifying the format of tags.  Unfortunately, that
has
> not
> been done, and considering the published WG milestones it appears that
> that
> issue has not been taken into consideration...  However, it
> appears that the WG has not considered the issues, with the effect
that
> the
> WG product lacks the "particular care" expected of BCP documents (RFC
> 2026).

It is unclear on what basis it is asserted that these issues have not
been considered by the WG. I believe most of the WG members would feel
that they have been reasonably taken into consideration.

I agree with that. But, the question is where was the related decisions taken. I would tend then to fully agree with Bruce.

> Note that it is not the registration procedural issues that are
typical of
> BCP documents that are problematic; rather it is the conflation of
> separate
> characteristics into a single tag syntax, specified in the same
document,
> which raises problems related to content negotiation.

Bruce asserts (a) that there is conflation of separate characteristics,
and that (b) this creates problems in content negotiation. The WG
determined that the characteristics conflated into a single tag are not
independent, and that it would be *separation* into separate attributes
that would result in problems in content negotiation, not their
combination into a single attribute.

Govermental authority over content is not an orthogonal information to language in some parts of the world. Question is to know if this is to be addressed as a general or a specific issue.

> Another large part of
> the problem is WG management; in addition to the issues raised by John
> Klensin the last time that LTRU participation was discussed on the
IETF
> discussion list -- and with which I wholeheartedly agree -- it appears
> that
> management of WG participant conduct has been rather lax; proponents
of
> the
> individual submission effort who are participating in the WG tend to
> resort
> to ad-hominem attacks when a problem is identified or when an
alternative
> approach is raised, with no visible intervention by the WG co-chairs.
> That
> has also (i.e. in addition to the factors which John identified) had
the
> effect of limiting WG participation by individuals.

It's unclear what bearing this has on what improvements can be made to
the drafts in fulfillment of the WG charter. I believe several WG
participants felt that management of conduct was lax, particularly in
relation to a very small number of participants with a penchant for
certain behaviours that would have challenged the best of moderators.

I suffered most of that: various innuendo on my age, my need of English teachers, the despise of my colleagues as "end users" vs. "IETF members" and "developers", "physical allusions to my possible broken nose", anonymous phone calls, loss of clients due to abusive mails they read under partners coporate name, accusations of ignorance by ... documented ignorant, rumours, etc.

I agree that one of the moderator actively engaged in that process. But these are the risks of opposing big interests. When it went too far, I appealed to the AD. The problem was corrected in minutes. The AD decided to pursue the appeal and ruled in a good way for the stability of the WG. It is true that from then on, insults against me did not result anymore in banning or warning or insulting me.

We all are grown boys. I am in that kind of business for nearly 30 years. I saw worse :-) (but usually more competent). I invited without problem all my opponents to have a drink in Paris (but none came to the IETF meeting, or told me). It would have been nice.

As for the accusation that proponents of an earlier individual
submission engaged in ad-hominem attacks that went without intervention
by the WG co-chairs, resulting in the limitation of participation in the
WG by other individuals, in the absence of specific evidence,

Please refer yourself to the mailing list. However, this is not a Last Call of the WG management, but a Last Call of the Document. The reasons why the document is incomplete should not be discussed so much, just what is missing or to correct.

But it is true that several have been rebuked by the attitude of the authors. I would say that this was evaluated very early. And that the debate is better served when people overcome this. One judges a tree to its fruits. The deliverable is not perfect: this is what matters today.

 this
appears itself to be no more than an ad-hominem attack on those
individuals and on the WG co-chairs. To my knowledge, there was only one
individual in relation to whom other members of the WG acted in any way
that might discourage or hinder his participation,

Two disclosed. Two implied. This is mostly because I accepted to represent others. But what would have been the use of making the WG a battle field? This is what the author wanted so the "best" would "win". This is not my vision of the IETF.

and such actions
arose only in response to repeated provocation from that individual

archives are here.

> Specification of "language" tag syntax which conflates other content
> characteristics prior to open and professional discussion of
negotiation
> issues and alternative approaches would be a premature lock-in of a
design
> choice.  As the document under discussion specifies a conflation of
such
> characteristics without open discussion

It is asserted that there has been no open discussion of the matter of
conflation. This is untrue. It is asserted that there has been no open
discussion of alternatives; the only concrete alternative presented for
discussion was to have separate language and script tags, which
alternative was considered and rejected due to problems that arise in
content negotiation. The drafts submitted for review are in accordance
with the charter, and I believe I can say that in the opinion of WG
members matters of conflation and of negotiation issues were taken into
consideration, and were discussed in an open and professional manner.

total disagreement on the outcome so far. But I hope we can overcome that with the help of the IETF/IESG.

A lot of things have already changed in what some say ....
jfc


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf