Re: Last Call: 'Tags for Identifying Languages' to BCP

From: Bruce Lilly <blilly(_at_)erols(_dot_)com>

This
is all what this proposition is about. This proposition is to give
_one_shot_ in a _standardised_ way the language, the script and the
country.


This was discussed during Last Call of the previous non-IETF

(individual

submission) attempt.  IIRC David Singer brought up several examples of
other pieces of information (e.g. legal/copyright variations) that

could

also be negotiated and which might affect the presentation of content

(or

choice among alternative content).  Lumping all of these separate

items

into
one tag is a poor design as it impedes negotiation and tends toward
lengthy
tags which are incompatible with fixed-length mechanisms such as MIME
encoded-words.

I agree that it would be poor design to incorporate other pieces of
information such as legal/copyright variations into language tags, but
as such pieces of information are not supported by the draft, this
appears to be irrelevant.


I agree with both points.

We should rather focus on whether it is good design to incorporate
information related to linguistic and written-form attributes, as
supported in the draft, into a single tag. The consensus of the LTRU
working group is that it is. For instance, the use of separate tags for
language and script were considered and rejected on the basis that the
two are not entirely orthogonal. Clear examples of this was considered:
while the intent of

Accept-Language: ar, az-Cyrl, ru

is clear, the intent of

Accept-Language: ar, az, ru
Accept-Script: Cyrl

or of

Accept-Language: ar, az, ru
Accept-Script: Arab, Cyrl

is not clear, nor is it obvious how rules could be specified that would
make the intent clear, or that would permit expressing the preferences
reflected in the first instance.


This is such an important point that it deserves to be caled out, lest it
be lost in the flurry of messages on this topic.

Designs the separate tagging of, say, script and langauge appear at
first glance to be more flexible and general. But appearances can be
deceiving. The problem is that using separate labels does not provide
an easy way of linking the two, and being able to express these
lingages is vital.

Tagging identifies characteristics of a particular piece of content. For
that purpose alone, it makes little difference (other than regarding the
aforementioned compatibility issues with existing IETF mechanisms) whether
the characteristics are lumped or separate.

On the contrary, it makes little difference only if the characteristics
in question are completely orthogonal.


And in the case of language and scripting tags the information is almost always
inseparatable - as far from orthogonal as you can get.

As pointed out above, the
characteristics of linguistic variety and written form are not
orthogonal, particularly when it comes to expressing user preferences,
and that it *does* make a difference if they are split into separate
metadata attributes or they are lumped together into a single metadata
attribute.


To be totally fair, it would be possible to define a linkage between the two.
Howegver, the representation would end up being fairly compliccated, not to
mention being totally incompatible with the existing field syntax. As far as I
can see the only time a multiple field plus linkage would be a win is when the
repetition of subordinate information resulted in an overly long field. But the
sizes of the tags here are so small that this is at best a marginal corner
case.

In summary, I beleve the approach of using separate fields offers no
advantages and has numerous disadvantages over the appeoach that was
chosen, and that the WG was correct to reject it.

                                Ned

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf