IETF-related spam from JFC Morfin

I have again received spam from JFC Morfin similar to the messages I
complained about last week:
http://www1.ietf.org/mail-archive/web/ietf/current/msg43675.html
See the attached message whose headers claim that it has been sent to a
couple of IETF lists, but which was actually BCCed as unsolicited bulk
email to various people who do not want to receive email from jefsey.
This kind of abuse should justify removing this person's list posting
rights.

Tony.
-- 
f.a.n.finch  <dot(_at_)dotat(_dot_)at>  http://dotat.at/
FORTIES CROMARTY FORTH: SOUTHERLY 6 TO GALE 8, DECREASING 5 OR 6 LATER. RAIN
OR SHOWERS. MODERATE OR GOOD.

--- Begin Message ---
I copy this important exchange with my premiminary note to most ofthe people I know who are interested in the language diversity issue.This is a long post. However the exchange is between the twocertainly most competent and key persons in the area and deals withthe most important point for language digital support.
All the best.
jfc


Preliminary note.

1. review of the positions
- in many ways we can share the positions expressed by PeterConstable (Microsoft, TC37, JAC, author of ISO 639-3).- at the same time we understand what Mark Davis (Google, Presidentof Unicode, Director of the CLDR project, co-author of RFC 4646) implies.- it is noteworthy that none of them refers to ISO 15897. May be thecrux of the problem?
2. better later than never
Unfortunately 15 months ago Mark Davis and Peter Constable denied methese answers and the resulting debate. As all the other points theyeventually start debating on IANA management and access, languageregistry reviewing process management, Suppress-script, CLDR,multi-media, etc. and will have to pursue on interoperability,IRI-tags, IDNs, protocols, relational spaces, retro-meta-spam,primary languages, language definition, new modes support, ethicspoints, cultural and language diversity, revision of RFC 4646conepts, etc. should have been addressed a long ago.
3. status of the http://bcp47.com endeavour
I thank you for your comments. I will try to use them for a globalreview of this key issue on http://bcp47.com. I remind you that we donot want to start implementing this site before we can start with acomprehensive description of the BCP 47 doctine, confirmed by theIESG and possibly the IAB. To that end, a certain number of appealshave been engaged, raisings the necessary questions to obtain thenecessary formal positions.
- IESG has addressed the RFC 4646 appeal in a way which did notrequire an appeal to IAB.- IESG has not addressed the interoperability, RFC 4646 respect,ethic, IANA control, etc. issues through the appeals against the DoSimposed on me. This is now escalated to the IAB.- IESG had promised to expedite the responses on RFC 4647 beforepublishing it. The RFC is now published for nearly three weeks.- the confusion over the respect of the RFC 4646 Review Process willresult in an appeal against its discrepancies. I appealed to MichaelEverson and wait for his answer.- the confusion over the WG-LTRU debate due to its charter renewal(they start discussing the points they denied in their charter) hascompleted its preliminary RC 2026 procedure. Appeal to the IESG isunder preparation.
4. links
Information on CLDR and on the Unicode consortium can be found underhttp://unicode.org. This consortium works on, publishes, and maintainmany more files and tables than its contribution to ISO 10646.
Initial mail of Peter Constable
On 9/27/06, Peter Constable<<mailto:petercon(_at_)microsoft(_dot_)com>petercon(_at_)microsoft(_dot_)com> wrote:[This is running a risk of straying off topic for this list, butI'll post this here since it still pertains to Don's questionsregarding whether particular reg entries should have certain infoadded to them.]
> From:<mailto:ietf-languages-bounces(_at_)alvestrand(_dot_)no>ietf-languages-bounces(_at_)alvestrand(_dot_)no[mailto:<mailto:ietf-languages->ietf-languages-> <mailto:bounces(_at_)alvestrand(_dot_)no>bounces(_at_)alvestrand(_dot_)no ] On BehalfOf Kent Karlsson
> > that region is a key attribute of a locale,
>
> ...no.
Please explain. I guess this might depend on one's view of what theminimal set of information categories that are required for a localeconsists of.
> > locale ID must always include a region component as well as a
> > language component.
>
> CLDR locales don't. Just about all locale data can, and often should,
> be in the "language only" named locales. Very rarely is there a difference
> from those locales that belong in the "language_territory" sublocales.
Not being a participant in the CLDR project, I'm not in a goodposition to evaluate the intent of the data I see there. I do notethat, e.g. there is a file "en.xml". But clearly there is no suchthing as a region-neutral English locale: every English speakerlives in a region where one of "M/d/yy" or "d/M/yy" is the preferredshort date format (and probably the majority live in regions thatprefer the latter), but this data file is not neutral wrt short dateformat: in spite of the name, the data it contains really isapplicable to the US. Now, perhaps the intent here is that this isdata that can be used as a default if region-specific data is notavailable, but it seems to me that's just a round about way ofsaying that en-US is used as the default locale for English.
> Yes, but choosing (a single) currency or a choosing a measurement
> system does not belong in a locale. Doing that is a mistake, similar to
> that of selecting character encoding via locale (as, unfortunately done
> in Unix/POSIX locales).
These are only ever defaults. It's not appropriate to assume thatevery English speaker in the US wants a short date format of"M/d/yy", but it is an appropriate default in that scenario. In thesame way, it's not appropriate to assume that a user in the US willalways use imperial units of measure, but it is reasonable to treatimperial units as a default. Same for currency.
Review of the CLDR project by Mark Davis
From: mark(_dot_)edward(_dot_)davis(_at_)gmail(_dot_)com 
[mailto:mark(_dot_)edward(_dot_)davis(_at_)gmail(_dot_)com]
Now, the difference between "language" identifiers and "locale"identifiers is notoriously slippery, so I'll provide some backgroundon how CLDR is actually structured, so you don't have to guess.
The CLDR data is separated into language-specific data, andnon-language specific data. The language-specific data does *not*include items like the currencies for a country, or the weekenddays, etc.; that is all in the non-language specific data. Here aresome examples:
<http://unicode.org/cldr/data/common/collation/>http://unicode.org/cldr/data/common/collation/
http://unicode.org/cldr/data/common/main/
The non-language-specific data includes which currencies were validin a particular country during which years, or which languages arecustomarily written in which scripts. Some examples are:
<http://unicode.org/cldr/data/common/supplemental/>http://unicode.org/cldr/data/common/supplemental/
http://unicode.org/cldr/data/common/transforms/
The so-called locale inheritance is used for the language-specificdata, not the non-language-specific data, so it would be moreaccurate to call it language inheritance. The vast majority of thelanguage-specific data does not differ by country. While, forexample, the content of en.xml is chosen to be appropriate for thethe most populous country speaking en (the US), that doesn't meanthat content is *always* inappropriate for many of the other regionsthat could use English (eg AG AI AS AU AW BB BM BS BW BZ CA CC CK CMCX DM ER FJ FK FM GB GD GH GI GM GY HK IE IN IO JM KE KI KN KY LC LRLS MH MP MS MT MW NA NF NG NR NU NZ PG PH PK PN PW RW SB SG SH SL SZTC TK TO TT TZ UG UM US VC VG VI ZA ZM ZW).
In cases where content does differ according to the region, such asthe UK, then one includes overrides of what is in en.XML. (Where thelanguage-specific data for two locale/language tags are the same anddifferent than the base, one can be aliased (either in full or inpart) to the other. Thus if en_ZW, for example, followed UK spellingconventions, then it could be aliased to en_UK. While the files use"_", CLDR recognizes "-" and "_" as equivalent in identifiers.)
You say:
>But clearly there is no such thing as a region-neutral English locale
This sentence is a bit slippery; it depends highly on what one meansby locale. Let me recast it. For a given type of content (eg countrynames) and a given language subtag, there may be differences amongregions (as defined by BCP47) or it may be that all regions sharethe same values. (For that matter, there may be differences *within*regions, as well -- either according to sub-region that BCP 47 isn'tfine-grained enough for (eg for some speech applications thedifferences Bostonian English may be important).
Where there are differences in regions, the region is important.Where there are not differences between regions, the region is notimportant. Thus in many cases, the CLDR data does not differ bycountry at all, so requiring a country subtag is pointless. In thatsense, I'd say your sentence
> that region is a key attribute of a locale,
is false. Region may or may not be significant, depending on thecontent, and depending on the language.
If you meant to say that the *ability* to have a region as acomponent of locale/language is key, then I'd agree with you --otherwise one couldn't distinguish between en-US and en-UK content.
I do, however, agree with you on the major point: this is all about*defaults*; identifiers have an inherent limitation -- theyrepresent some class of users, within which there will always be variations.
Mark
At 07:13 28/09/2006, Peter Constable wrote:
(Note: typography results from Peter Constable use of HTML in his post).
As I said earlier, this very much depends on oneâ??s notion of whata locale is. You say,
â??â?¦ it depends highly on what one means by locale. Let me recastit. For a given type of content (eg country names) and a givenlanguage subtag, there may be differences among regions? or it maybe that all regions shhare the same values.â??
You are picking out one particular data category, country names.That is not a locale, by any usage Iâ??ve ever seen before now! Idonâ??t in the slightest question that, for a single data categoryfor which the values are linguistic expressions, region is notnecessarily relevant. But again, that is not a locale.
You are casting â??localeâ?? as a data collection that is completelyvariable wrt the data categories it contains, with no minimal set ofrequired data categories (thereâ??s only the proviso that there beat least one kind of content). I can easily imagine thatâ??s auseful approach to managing data in a repository like CLDR, wherethe only functional requirement is data management. But a datacollection in that context is just that, a collection of data, not alocale. A locale is a locale by virtue of its role within a softwareimplementation.
So, while I have no problem saying that a set of country names inEnglish is locale data, I would not say that makes it a locale. But,of course, the way I am casting it leaves open the question of justwhat the â??role within a software implementationâ?? needs to look like.
And, of course, Iâ??m assuming a model thatâ??s been around for awhile in which a software implementation has various functions toproduce various kinds of culture-dependent results -- provide acountry name, format a numeric value as a currency string, sort aset of data, etc. ? where all of those functions havee in common aparticular parameter that uses one set of system-recognized symbolsto determine the culture to be assumed in producing any of thoseresults. In that model, I contend that region is always a key factorin the cultural distinctions because there is always one or morefunctions that produce results that are regionally determined oreven specific to a particular region: date format, default currencysymbol, etc.
And, of course, itâ??s possible to imagine an implementation thatdoesnâ??t use that culture-atom model â?" i.e. an implementation inwhich different sets of symbols are used for parameterizingdifferent clussters of functions. The whole set of functions stillhave in common that they produce some kind of culture-dependentresult, but different ones use different parameters to determinedifferent cultural attributes as are relevant for the givenfunction. So, for instance, a function that formats a numeric datevalue as a day name in a given language might use as a parameterjust a language ID with no region element, while another functionthat formats a numeric value as a currency string might use as aparameter just a region ID with no language element. Perhapssoftware implementations in the future will all work this way suchthat there are no longer any functions that rely on parameters thatcorrespond to â??locale IDsâ?? / LCIDs as those are understood inthat model described in the preceding paragraph. In that case, youmight well have a situation in which IDs with region elements areneeded only exceptionally ? as you are suggestion. But in thatcasee, Iâ??d say that those identifiers that are used are IDs forsome other notions, not locale IDs.
Again, this is probably straying off topic for this list, so Ishould let this one go.
Â
Peter
--- End Message ---

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf