Re: Last Call: 'Tags for Identifying Languages' to BCP

On 00:40 26/08/2005, David Hopwood said:

JFC (Jefsey) Morfin wrote:
[...] Today, the common practice of nearly one billion of Internetusers is to be able to turn off cookies to protect their anonymousfree usage of the web. Once the Draft enters into action they willbe imposed a conflicting privacy violation: "tell me what you read,I will tell you who you are": any OPES can monitor the exchange,extact these unambigous ASCII tags, and know (or block) what youread. You can call these tags in google and learn a lot aboutpeople. There is no proposed way to turn that personal tagging off,nor to encode it.
I don't know which browser you use, but in Firefox, I can configure exactly
which language tags it sends. If it were sending other information using
language tags as a covert channel (which it *could* do regardless of the
draft under discussion), I'd expect that to be treated as at least a bug,
and if it were a deliberate privacy violation, I'd expect that to cause a
big scandal.


Dear David,

the privacy problem is the "what you read, who you are" intelligenceleak. Today langtags are not yet much used (say the W3C people in theWG-ltru) when compared with what they should in XML, HTML, etc. Thisis all what this proposition is about. This proposition is to give_one_shot_ in a _standardised_ way the language, the script and thecountry. It uses for that ISO codes. ISO never wanted to propose sucha code where:

ar-arab-us are texts destined the people interested in US Arabiccommunity issues.iw-hebr-ru are texts destined to people interested in Jewish Russiancommunity,

etc.

When you browser accept that langtags and you pursue the relation,this structured information can be filtered by ISP (for police,censoring, intelligence gathering, etc.) to know about their users.It can be used for searches on a large scale in search engines toknow the mail you responded, etc. I suppose that in most of the worldcountries this is subject to privacy laws. I think that in France itis subject to the anti-racist law (the one used against Yahoo a few years ago).

The problem is that there is no way for the _receiver_to turn itdown. This is potentially dangerous spam: it is a digital informationI never asked for, which discloses information on me.

Is that a reason why to kill the Draft? I do not think so, but itcertainly shows the complexity of the issue - and the lack ofpreparation of the Draft (I proposed the Security section to betterwarn about the problem). IETF proposes a solution: it is the OPES. AnOPES on the host side can remove the langtags or to encrypt them atthe request of the reader, without a change on the host. I tried tomake the WG-ltru understand that not considering/reminding OPES atthe same time as documenting langtags is criminal.

This is why the default proposition I make (the Draft's ABNF andsystem being considered as a starting default proposition, and hooksopen to IRI Tags adapted to each situation at the decision of theuser or of services he trusts).

Let take the case above. A service provider can propose an OPESservice, changing "he-hebr-us" into "x-abcf" and an internal OPESplug-in to the user to restore x-abcf into he-hebr-us, so hislibraries work. And mani L9 organisations/Governments are satisfied.He can even provide dynamically updated langtag aliases. However, agood service should warranty the service is conflict free. This is noproblem if the langtag alias is x-service.com:abcf (conforming withURI Tag RFC), but this is forbidden by the Draft. My proposition isto use "0-" has a hook to specific format, so the Draft ABNF is fullyrespected.

In that case "0-service.com:abcf will be not rise an error. And willnot conflict with the people using the default format (the formatproposed by the Draft). The interest of "0-" is that it can bemultilingual, so the hook can work in ASCII but also in punycode, andin any script. It can also be entirerly numeric and possibly referdirectly to an IPv6 address, making the scheme DN independent.

I support it as a transition standard track RFC needed by some,as long as it does not exclude more specific/advanced languageidentification formats, processes or future IANA or ISO 11179conformant registries.
The grammar defined in the draft is already flexible enough.
(I suppose you mean more than just grammar. Talking of the ABNF isprobably clearer?).I am certainly eager to learn how I can support modal information(type of voice, accent, signs, icons, feelings, fount, etc.),medium information, language references (for example is it plain,basic, popular English? used dictionary, used software publisher),nor the context (style, relation, etc.), nor the nature of the text(mono, multilingual, human or machine oriented - for example whatis the tag to use for a multilingual file [printed in a language ofchoice]), the date of the langtag version being used, etc.
I mean that the grammar is flexible enough to encode any of theabove attributes (not that it would be useful or a good idea to encode most
of them).


hmmm.... you take the responsibility of both declarations :-)
- that you _can_ encode it. But you do not provide examples.

- that it would not be useful or a good idea to encode basic contentobject attributes.

The Draft has introduced the "script" subtag in addition to RFC3066 (what is an obvious change). However in order to stay"compatible" with RFC 3066, author says it cannot introduce aspecific support of URI tags.
This objection seems to be correct: URI tags include characters notallowed by RFC 3066.

Then? The purpose of this work is to address the limitations of RFC3066. URI tags did not exist when RFC 3066 was written. Do you meanfor example that langtags are to be ASCII only because RFC 3066 was ASCII only?

But you could easily encode the equivalent information to an URItag, if you wanted to.

please document how do you do, while respecting the hybrid format ofthe proposed ABNF where information is not indentified by fixedposition, but also relative position and size, with "-" as soleseparator. And they want to keep labels between "-" 8 characterslong. Tell me how you support IDNs.


Let suppose that I have "lang-tags.org:" as a scheme.
or "xn--abcdef.com:". Tell me how you support them
jfc



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf