RE: draft-phillips-langtags-08, process, sp ecifications, "stability",

From: JFC (Jefsey) Morfin [mailto:jefsey(_at_)jefsey(_dot_)com]

Of course it would not be clear if you don't have a conceptual model of
what "language" tags are identifiers *of*. When RFC 3066 was being
developed, there was a suggestion that script IDs be incorporated, but
some were reluctant, raising the same question you have here. I was one

of

those. But I didn't remain obstructionist over the issue; instead, I gave
a fair amount of thought to the ontology that underlies "language" tags,
and subsequently published a white paper and presented on the topic at

two

conferences in the spring and fall of 2002. (Paper is available online at
http://www.sil.org/silewp/abstract.asp?ref=2002-003 -- my thinking has
evolved since then, but some key results remain valid, I think.)


May us know which ones?


It would be easier to identify two key points on which my thinking has changed.

IIRC, I was uncertain at the time about what to do wrt sorting. I have since 
concluded that sort order is a presentation issue that, while linguistically 
related, is out of scope for language identifiers. (Note that there is no 
common usage scenario in which it makes sense to declare the sorted order of 
content.) Sort order may certainly be in scope for a locale identifier, but not 
for a "language" tag.

The bigger change is that I have abandoned the fourth main category in the 
ontological model I proposed. At the time, I was still trying to work out where 
something like "Latin America Spanish" fit in. I saw the similarity to 
sub-language varieties / dialects, but at the time thought it needed to be a 
distinct category, for which reason I concocted the notion "domain-specific 
data set". 

I was never very satisfied with that: it wasn't a particularly consistent model 
(a data set is quite a different kind of thing from a language variety) and it 
ignored the similarity with sub-language variety. (And the name was a bit 
unwieldy.) 

I have since realized that I was tripping up on the very problem that was 
blocking the Language Tag Reviewer from accepting the requested registration 
for "es-americas": the assumption that a language tag necessarily refers to a 
conventionally-recognized linguistic identity that exists in the world. 
Language tags are not attributes declared on language varieties; they are 
attributes declared on information objects, indicating linguistic properties of 
those information objects. And the linguistic attributes of an information 
object do not necessarily coincide with conventionally-recognized linguistic 
identities. Of course, in the majority of useful cases they will; but it's not 
hard to show that this is not always the case: e.g. if I present "chat" as an 
expression that could be intrepreted in relation to several different 
languages, it would be entirely appropropriate for me to declare a linguistic 
attribute of that expression of "indeterminate" since that is precisely my 
intent -- but clearly "indeterminate" doesn't correspond with any particular 
language identity out in the world.

Thus, I came to realize that the kind of distinction intended by "es-americas" 
was just the same kind of distinction made for any sub-language variety: it 
declares that the information object is not only in some particular language, 
but is even more constrained in terms of the language variety in use. It is 
simply coincidental that the more constrained usage in this case doesn't 
coincide with a single dialect used by some identifiable speaker community.



Peter Constable

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions