Re: Last Call: 'Tags for Identifying Languages' to BCP

 Date: 2005-08-25 20:55
 From: "JFC (Jefsey) Morfin" <jefsey(_at_)jefsey(_dot_)com>

the privacy problem is the "what you read, who you are" intelligence 
leak.


That is to some extent true of any negotiation mechanism and negotiated
value.

Today langtags are not yet much used (say the W3C people in the  
WG-ltru) when compared with  what they should in XML, HTML, etc.


XML, HTML, etc. are not IETF protocols and should not be the main
consideration in IETF work on IETF documents, especially as language tags
are heavily used by IETF protocols, notably MIME (RFCs 2045, 2047, 2231,
3282) and widely-deployed core IETF application protocols which use MIME
(e.g. the Internet Message Format and its applications (email, news, voice
messaging, EDI, etc.) and HTTP and applications using HTTP as a substrate.

This  
is all what this proposition is about. This proposition is to give 
_one_shot_ in a _standardised_ way the language, the script and the 
country.


This was discussed during Last Call of the previous non-IETF (individual
submission) attempt.  IIRC David Singer brought up several examples of
other pieces of information (e.g. legal/copyright variations) that could
also be negotiated and which might affect the presentation of content (or
choice among alternative content).  Lumping all of these separate items into
one tag is a poor design as it impedes negotiation and tends toward lengthy
tags which are incompatible with fixed-length mechanisms such as MIME
encoded-words.  While there is some mention of this issue in the document
under discussion, its treatment and resolving the underlying issue in a
manner that would minimize the problems are lacking.

It uses for that ISO codes. ISO never wanted to propose such  
a code where:

ar-arab-us are texts destined the people interested in US Arabic 
community issues.
iw-hebr-ru are texts destined to people interested in Jewish Russian 
community,
etc.

When you browser accept that langtags and you pursue the relation, 
this structured information can be filtered by ISP (for police, 
censoring, intelligence gathering, etc.) to know about their users. 
It can be used for searches on a large scale in search engines to 
know the mail you responded, etc. I suppose that in most of the world 
countries this is subject to privacy laws. I think that in France it 
is subject to the anti-racist law (the one used against Yahoo a few years 
ago).


Let's separate three issues:
1. privacy
2. tagging
3. negotiation

The privacy issue exists whenever any information is conveyed; the user
needs to balance privacy concerns with facilitation of communication.
Mechanisms such as TLS can be used to limit the visibility of the information
to the end points of communication; ultimately it boils down to a matter of
trust in the end-point partner in the communication exchange.  I believe
that the issue is dealt with adequately in the security considerations
section of the document under discussion (some mention of transport-level
protection of privacy would be welcome).

Tagging identifies characteristics of a particular piece of content.  For
that purpose alone, it makes little difference (other than regarding the
aforementioned compatibility issues with existing IETF mechanisms) whether
the characteristics are lumped or separate.  There are existing IETF
mechanisms which permit handling of either lumped or individual characteristics
(e.g. the extensible header field mechanism of RFC 2045 and the "feature/filter"
mechanism of RFC 2533/2738/2912).  Tagging per se identifies characteristics
of content.  While that may be used to infer something about the content
provider, such inferences may be unreliable, particularly for providers that
support a wide variety of characteristics for the content in question.

Negotiation of characteristics is where several issues arise.  One such
issue, as discussed here in December 2004/January 2005 relates to an
algorithm for matching content characteristics (e.g. between a particular
piece of content and a specified range of acceptance (as in an RFC 3282
Accept-Language field).  RFC 3066 skirted that issue as it stopped short of
specification of an algorithm, and as it specified a mere two particular
characteristics (language per se, and country) which could be combined in
a tag.  That was not true of the individual submission, which combined at
least 5 characteristics and specified an algorithm.  As a result of issues
with that approach, the LTRU WG was established with a charter to produce a
BCP (for registration procedures) and a separate Standards Track document
for topics such as algorithms which are unsuitable for BCP.  A related issue
is the interaction of the established negotiation mechanism (viz. the RFC
3282 Accept-Language field) and potential use of the other (feature/filter)
mechanism for negotiation.  The Accept-Language field provides for
specification of language ranges and for associating a preference value
with specific languages (as defined in RFC 3066) or ranges.  The proposed
mechanism in the individual submission of late last year (essentially
unchanged in the LTRU product (see discussion below)) does not address the
language range issue, and that issue is greatly complicated by conflating
separate characteristics into a single tag.  Addressing the language range
issue is not a WG work item and, unfortunately, the algorithm issue is
scheduled to be a later work item than the registry issue.  Added to that
is the fact that the specification of the tag format is mixed with
registration procedures.  Negotiation of separate characteristics is much
simpler than that of a combined conflation of characteristics; each
characteristic can be assigned separate preference values, and irrelevant
characteristics (e.g. script w.r.t. spoken language) can be easily ignored.

As negotiation and related issues represent a critical technical issue for
the design of language tags (viz. keeping separate characteristics out of
*language* tags), it is essential that such negotiation issues be considered
carefully before specifying the format of tags.  Unfortunately, that has not
been done, and considering the published WG milestones it appears that that
issue has not been taken into consideration.  It should be pointed out that
such issues have been raised, both in the discussion during Last Call of the
individual submission and as a result of subsequent work.  However, it
appears that the WG has not considered the issues, with the effect that the
WG product lacks the "particular care" expected of BCP documents (RFC 2026).
Note that it is not the registration procedural issues that are typical of
BCP documents that are problematic; rather it is the conflation of separate
characteristics into a single tag syntax, specified in the same document,
which raises problems related to content negotiation.

Part of the problem is the scheduling of WG work items as noted above
(viz. negotiation issues are critical to design of tag syntax, and should not
have been deferred until after syntax specification).  Another large part of
the problem is WG management; in addition to the issues raised by John
Klensin the last time that LTRU participation was discussed on the IETF
discussion list -- and with which I wholeheartedly agree -- it appears that
management of WG participant conduct has been rather lax; proponents of the
individual submission effort who are participating in the WG tend to resort
to ad-hominem attacks when a problem is identified or when an alternative
approach is raised, with no visible intervention by the WG co-chairs.  That
has also (i.e. in addition to the factors which John identified) had the
effect of limiting WG participation by individuals.

Specification of "language" tag syntax which conflates other content
characteristics prior to open and professional discussion of negotiation
issues and alternative approaches would be a premature lock-in of a design
choice.  As the document under discussion specifies a conflation of such
characteristics without open discussion -- indeed hampered by unchecked
unprofessional conduct -- it should not be approved as BCP in its current
form.  Separation of syntax specification to a separate document, to be
specified after due consideration of negotiation issues, leaving purely
procedural issues of registration, would be one approach to enable making
a decision on BCP registration procedures independently of an in advance of
a concrete specification of negotiation issues and tag syntax.  However,
as it stands, the document cannot be evaluated for soundness of the tag
syntax design in the absence of a specification that addresses negotiation
issues (in a backwards-compatible manner with the existing negotiation
mechanisms (viz. MIME Content- and Accept- fields and feature/filter
negotiation).

Therefore, at minimum, I recommend that the IESG defer a decision on the
subject document until such time as the full impact of the early design
choice to conflate multiple characteristics into a single tag can be fully
evaluated w.r.t. proposed matching algorithms and impact on existing
IETF-approved negotiation mechanisms.   Revision to move the syntax
specification to a separate document, as mentioned above, would permit
evaluation of the registration procedures per se independently of such
concerns, and would be one way to move forward on those registration
procedures quickly (i.e. independently of analysis of the syntax design)
if that is deemed desirable.

Aside form that, the IESG (via the cognizant ADs) should address the issues
of WG charter work items and milestones as they relate to consideration of
negotiation issues prior to locking down a tag syntax specification, should
emphasize the importance of backwards compatibility with established,
approved, and widely deployed IETF protocols and mechanisms, and should
discuss WG participant conduct (viz. ad-hominem attacks) and mailing list
issues (as identified by JCK) with the WG co-chairs.

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf