Re: Last Call: 'Tags for Identifying Languages' to BCP

From: Bruce Lilly <blilly(_at_)erols(_dot_)com>

This
is all what this proposition is about. This proposition is to give
_one_shot_ in a _standardised_ way the language, the script and the
country.


This was discussed during Last Call of the previous non-IETF

(individual

submission) attempt.  IIRC David Singer brought up several examples of
other pieces of information (e.g. legal/copyright variations) that

could

also be negotiated and which might affect the presentation of content

(or

choice among alternative content).  Lumping all of these separate

items

into
one tag is a poor design as it impedes negotiation and tends toward
lengthy
tags which are incompatible with fixed-length mechanisms such as MIME
encoded-words.


I agree that it would be poor design to incorporate other pieces of
information such as legal/copyright variations into language tags, but
as such pieces of information are not supported by the draft, this
appears to be irrelevant. 

We should rather focus on whether it is good design to incorporate
information related to linguistic and written-form attributes, as
supported in the draft, into a single tag. The consensus of the LTRU
working group is that it is. For instance, the use of separate tags for
language and script were considered and rejected on the basis that the
two are not entirely orthogonal. Clear examples of this was considered:
while the intent of 

Accept-Language: ar, az-Cyrl, ru

is clear, the intent of

Accept-Language: ar, az, ru
Accept-Script: Cyrl

or of

Accept-Language: ar, az, ru
Accept-Script: Arab, Cyrl

is not clear, nor is it obvious how rules could be specified that would
make the intent clear, or that would permit expressing the preferences
reflected in the first instance.

It was also the consensus of the WG that the concerns of fixed-length
mechanisms have been adequately addressed. This consensus was taken
after careful consideration of IETF protocols known to involve length
limitations. It should be noted in this regard that the likely length of
language tags under this draft is no different than under RFC 3066; the
only difference is that this draft imposes greater constraints on the
form and meaning of subtag elements.

While there is some mention of this issue in the document
under discussion, its treatment and resolving the underlying issue in

manner that would minimize the problems are lacking.


It's unclear what is meant by "the underlying issue". Please clarify.

Tagging identifies characteristics of a particular piece of content.

For

that purpose alone, it makes little difference (other than regarding

the

aforementioned compatibility issues with existing IETF mechanisms)

whether

the characteristics are lumped or separate.


On the contrary, it makes little difference only if the characteristics
in question are completely orthogonal. As pointed out above, the
characteristics of linguistic variety and written form are not
orthogonal, particularly when it comes to expressing user preferences,
and that it *does* make a difference if they are split into separate
metadata attributes or they are lumped together into a single metadata
attribute.

While that may be used to infer something about the content
provider, such inferences may be unreliable...


Quite so. This point was discussed in the WG.

Negotiation of characteristics is where several issues arise...

As a result of issues
with that approach, the LTRU WG was established with a charter to

produce

a
BCP (for registration procedures) and a separate Standards Track

document

for topics such as algorithms which are unsuitable for BCP.


The LTRU WG is a little behind its initially-proposed schedule for
milestones, but otherwise is on track to complete the approved
milestones in order. Thus, the latter document is in progress.

The proposed
mechanism in the individual submission of late last year (essentially
unchanged in the LTRU product (see discussion below)) does not address

the

language range issue, and that issue is greatly complicated by

conflating

separate characteristics into a single tag.


It is unclear how the drafts in question can be critiqued for failing to
address "the language range issue" (which issue is not clearly
identified here, btw) given the explicit plan in the charter that
algorithms for matching be addressed in a separate document to be
completed after these drafts.

Addressing the language range
issue is not a WG work item and, unfortunately, the algorithm issue is
scheduled to be a later work item than the registry issue.  Added to

that

is the fact that the specification of the tag format is mixed with
registration procedures.


As according to the charter.

Negotiation of separate characteristics is much
simpler than that of a combined conflation of characteristics; each
characteristic can be assigned separate preference values, and

irrelevant

characteristics (e.g. script w.r.t. spoken language) can be easily

ignored.

Negotiation of separate attributes involving inter-related
characteristics is *not* simpler, as pointed out above. The draft fully
allows for irrelevant characteristics (e.g. script wrt audio content) to
be ignored. Again, what has been provided in the draft is in accordance
with the charter of the WG.

As negotiation and related issues represent a critical technical issue

for

the design of language tags (viz. keeping separate characteristics out

of

*language* tags), it is essential that such negotiation issues be
considered
carefully before specifying the format of tags.  Unfortunately, that

has

not
been done, and considering the published WG milestones it appears that
that
issue has not been taken into consideration...  However, it
appears that the WG has not considered the issues, with the effect

that

the
WG product lacks the "particular care" expected of BCP documents (RFC
2026).


It is unclear on what basis it is asserted that these issues have not
been considered by the WG. I believe most of the WG members would feel
that they have been reasonably taken into consideration. Again, what has
been submitted for last call is in accordance with the charter; just as
it is not reliable to infer something about a content provider from a
language tag, so also it is not reliable to infer from the order of
milestones in the charter that matching issues were not taken into
consideration in preparation of these drafts.

Note that it is not the registration procedural issues that are

typical of

BCP documents that are problematic; rather it is the conflation of
separate
characteristics into a single tag syntax, specified in the same

document,

which raises problems related to content negotiation.


Bruce asserts (a) that there is conflation of separate characteristics,
and that (b) this creates problems in content negotiation. The WG
determined that the characteristics conflated into a single tag are not
independent, and that it would be *separation* into separate attributes
that would result in problems in content negotiation, not their
combination into a single attribute.

Another large part of
the problem is WG management; in addition to the issues raised by John
Klensin the last time that LTRU participation was discussed on the

IETF

discussion list -- and with which I wholeheartedly agree -- it appears
that
management of WG participant conduct has been rather lax; proponents

of

the
individual submission effort who are participating in the WG tend to
resort
to ad-hominem attacks when a problem is identified or when an

alternative

approach is raised, with no visible intervention by the WG co-chairs.
That
has also (i.e. in addition to the factors which John identified) had

the

effect of limiting WG participation by individuals.


It's unclear what bearing this has on what improvements can be made to
the drafts in fulfillment of the WG charter. I believe several WG
participants felt that management of conduct was lax, particularly in
relation to a very small number of participants with a penchant for
certain behaviours that would have challenged the best of moderators.

As for the accusation that proponents of an earlier individual
submission engaged in ad-hominem attacks that went without intervention
by the WG co-chairs, resulting in the limitation of participation in the
WG by other individuals, in the absence of specific evidence, this
appears itself to be no more than an ad-hominem attack on those
individuals and on the WG co-chairs. To my knowledge, there was only one
individual in relation to whom other members of the WG acted in any way
that might discourage or hinder his participation, and such actions
arose only in response to repeated provocation from that individual.

Specification of "language" tag syntax which conflates other content
characteristics prior to open and professional discussion of

negotiation

issues and alternative approaches would be a premature lock-in of a

design

choice.  As the document under discussion specifies a conflation of

such

characteristics without open discussion


It is asserted that there has been no open discussion of the matter of
conflation. This is untrue. It is asserted that there has been no open
discussion of alternatives; the only concrete alternative presented for
discussion was to have separate language and script tags, which
alternative was considered and rejected due to problems that arise in
content negotiation. The drafts submitted for review are in accordance
with the charter, and I believe I can say that in the opinion of WG
members matters of conflation and of negotiation issues were taken into
consideration, and were discussed in an open and professional manner.



Peter Constable

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf