Re: Last Call: 'Tags for Identifying Languages' to BCP

 Date: 2005-08-28 16:25
 From: Frank Ellermann <nobody(_at_)xyzzy(_dot_)claranet(_dot_)de>

That's a last call, if you have better ideas than those in the
draft speak up.  Your Content-Script idea is good, but won't
help e.g. in encoded words (2047+2231).


Encoded-words have several characteristics, one of which is limited
length (in octets).  That has two implications w.r.t. script:
1. specifying script explicitly is unnecessary; it can be determined
   from the charset (always specified in an encoded-word) and the
   specific octets of the encoded text (ISO-8859-1 is latin script,
   KOI8 is Cyrillic, etc.).
2. an encoded-word has limited space available.  of a maximum of 76
   octets in an encoded-word specifying language, there are 8 for
   overhead, at least one (currently exactly one) for specification
   of encoding method, a charset specification (registered charsets
   have names up to 45 octets in length), the language tag, and some
   encoded text.  The encoded text must be at least one octet for Q
   encoding and a simple (unshifted) charset; for B encoding (and an
   unshifted charset) it has to be a multiple of 4 octets, and a typical
   charset with shift sequences will require on the order of 6 octets
   minimum (for Q encoding; 8-12 minimum for B encoding). Specifying
   (unnecessarily; see above) script reduces the available space for
   actual (encoded) text; possibly to the point of impossibility in
   pathological cases.

Specification of script is only a performance enhancement for long texts
(not the case for encoded-words) where a multi-script charset is in use.

While the Content-Script (or similar feature/filter mechanism) would not
be applicable to encoded-words, specification of script is unnecessary
for encoded-words (and undesirable due to impact on the available text
space).

Specification of script is only possible where a given text uses a single
script, and that limitation applies to any of the methods of indication
mentioned above, including the addition to language tags proposed by the
draft under discussion.

Script is a characteristic of written text; it is not applicable to (e.g.)
audio media types.  It really should be a text media type parameter (or
feature).

This is a ready-for-Bruce's-review draft as far as I can judge
this, but for obvious reasons only you can really judge it. ;-)


As I mentioned in an earlier message, without a concrete specification
for negotiation, it is not possible to fully assess the proposed syntax
changes.

Addressing the language range issue is not a WG work item
and, unfortunately, the algorithm issue is scheduled to be a
later work item than the registry issue.


Only my personal view of course, but the matching draft offers
a syntactical form for ranges,


There is no such draft in Last Call at this time, as far as I know.

if ISO 3166-1 pulls another CS 3066bis will handle it
better than 3066 (no potential worldwide retagging confusion).


I am unaware of any "worldwide retagging confusion" w.r.t. language
tags and "CS".

it appears that management of WG participant conduct has been
rather lax


IBTD, the WG Chairs and the responsible AD did a very good job.


As an affected party, I disagree.

Revision to move the syntax specification to a separate
document, as mentioned above, would permit evaluation of the
registration procedures per se


You can also read chapter 3 per se, the mentioned 14 pages
plus 3.1 as introduction (5 pages, format of the registry).


But a single section isn't being Last Called; it is the entire document,
and lacking specification of negotiation mechanisms it is not possible
to fully assess the document as it stands.

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf