IETF-related spam from JFC Morfin
2006-09-28 04:59:10
I have again received spam from JFC Morfin similar to the messages I
complained about last week:
http://www1.ietf.org/mail-archive/web/ietf/current/msg43675.html
See the attached message whose headers claim that it has been sent to a
couple of IETF lists, but which was actually BCCed as unsolicited bulk
email to various people who do not want to receive email from jefsey.
This kind of abuse should justify removing this person's list posting
rights.
Tony.
--
f.a.n.finch <dot(_at_)dotat(_dot_)at> http://dotat.at/
FORTIES CROMARTY FORTH: SOUTHERLY 6 TO GALE 8, DECREASING 5 OR 6 LATER. RAIN
OR SHOWERS. MODERATE OR GOOD. --- Begin Message ---
I copy this important exchange with my premiminary note to most of
the people I know who are interested in the language diversity issue.
This is a long post. However the exchange is between the two
certainly most competent and key persons in the area and deals with
the most important point for language digital support.
All the best.
jfc
Preliminary note.
1. review of the positions
- in many ways we can share the positions expressed by Peter
Constable (Microsoft, TC37, JAC, author of ISO 639-3).
- at the same time we understand what Mark Davis (Google, President
of Unicode, Director of the CLDR project, co-author of RFC 4646) implies.
- it is noteworthy that none of them refers to ISO 15897. May be the
crux of the problem?
2. better later than never
Unfortunately 15 months ago Mark Davis and Peter Constable denied me
these answers and the resulting debate. As all the other points they
eventually start debating on IANA management and access, language
registry reviewing process management, Suppress-script, CLDR,
multi-media, etc. and will have to pursue on interoperability,
IRI-tags, IDNs, protocols, relational spaces, retro-meta-spam,
primary languages, language definition, new modes support, ethics
points, cultural and language diversity, revision of RFC 4646
conepts, etc. should have been addressed a long ago.
3. status of the http://bcp47.com endeavour
I thank you for your comments. I will try to use them for a global
review of this key issue on http://bcp47.com. I remind you that we do
not want to start implementing this site before we can start with a
comprehensive description of the BCP 47 doctine, confirmed by the
IESG and possibly the IAB. To that end, a certain number of appeals
have been engaged, raisings the necessary questions to obtain the
necessary formal positions.
- IESG has addressed the RFC 4646 appeal in a way which did not
require an appeal to IAB.
- IESG has not addressed the interoperability, RFC 4646 respect,
ethic, IANA control, etc. issues through the appeals against the DoS
imposed on me. This is now escalated to the IAB.
- IESG had promised to expedite the responses on RFC 4647 before
publishing it. The RFC is now published for nearly three weeks.
- the confusion over the respect of the RFC 4646 Review Process will
result in an appeal against its discrepancies. I appealed to Michael
Everson and wait for his answer.
- the confusion over the WG-LTRU debate due to its charter renewal
(they start discussing the points they denied in their charter) has
completed its preliminary RC 2026 procedure. Appeal to the IESG is
under preparation.
4. links
Information on CLDR and on the Unicode consortium can be found under
http://unicode.org. This consortium works on, publishes, and maintain
many more files and tables than its contribution to ISO 10646.
Initial mail of Peter Constable
On 9/27/06, Peter Constable
<<mailto:petercon(_at_)microsoft(_dot_)com>petercon(_at_)microsoft(_dot_)com> wrote:
[This is running a risk of straying off topic for this list, but
I'll post this here since it still pertains to Don's questions
regarding whether particular reg entries should have certain info
added to them.]
> From:
<mailto:ietf-languages-bounces(_at_)alvestrand(_dot_)no>ietf-languages-bounces(_at_)alvestrand(_dot_)no
[mailto:<mailto:ietf-languages->ietf-languages-
> <mailto:bounces(_at_)alvestrand(_dot_)no>bounces(_at_)alvestrand(_dot_)no ] On Behalf
Of Kent Karlsson
> > that region is a key attribute of a locale,
>
> ...no.
Please explain. I guess this might depend on one's view of what the
minimal set of information categories that are required for a locale
consists of.
> > locale ID must always include a region component as well as a
> > language component.
>
> CLDR locales don't. Just about all locale data can, and often should,
> be in the "language only" named locales. Very rarely is there a difference
> from those locales that belong in the "language_territory" sublocales.
Not being a participant in the CLDR project, I'm not in a good
position to evaluate the intent of the data I see there. I do note
that, e.g. there is a file "en.xml". But clearly there is no such
thing as a region-neutral English locale: every English speaker
lives in a region where one of "M/d/yy" or "d/M/yy" is the preferred
short date format (and probably the majority live in regions that
prefer the latter), but this data file is not neutral wrt short date
format: in spite of the name, the data it contains really is
applicable to the US. Now, perhaps the intent here is that this is
data that can be used as a default if region-specific data is not
available, but it seems to me that's just a round about way of
saying that en-US is used as the default locale for English.
> Yes, but choosing (a single) currency or a choosing a measurement
> system does not belong in a locale. Doing that is a mistake, similar to
> that of selecting character encoding via locale (as, unfortunately done
> in Unix/POSIX locales).
These are only ever defaults. It's not appropriate to assume that
every English speaker in the US wants a short date format of
"M/d/yy", but it is an appropriate default in that scenario. In the
same way, it's not appropriate to assume that a user in the US will
always use imperial units of measure, but it is reasonable to treat
imperial units as a default. Same for currency.
Review of the CLDR project by Mark Davis
From: mark(_dot_)edward(_dot_)davis(_at_)gmail(_dot_)com
[mailto:mark(_dot_)edward(_dot_)davis(_at_)gmail(_dot_)com]
Now, the difference between "language" identifiers and "locale"
identifiers is notoriously slippery, so I'll provide some background
on how CLDR is actually structured, so you don't have to guess.
The CLDR data is separated into language-specific data, and
non-language specific data. The language-specific data does *not*
include items like the currencies for a country, or the weekend
days, etc.; that is all in the non-language specific data. Here are
some examples:
<http://unicode.org/cldr/data/common/collation/>http://unicode.org/cldr/data/common/collation/
http://unicode.org/cldr/data/common/main/
The non-language-specific data includes which currencies were valid
in a particular country during which years, or which languages are
customarily written in which scripts. Some examples are:
<http://unicode.org/cldr/data/common/supplemental/>http://unicode.org/cldr/data/common/supplemental/
http://unicode.org/cldr/data/common/transforms/
The so-called locale inheritance is used for the language-specific
data, not the non-language-specific data, so it would be more
accurate to call it language inheritance. The vast majority of the
language-specific data does not differ by country. While, for
example, the content of en.xml is chosen to be appropriate for the
the most populous country speaking en (the US), that doesn't mean
that content is *always* inappropriate for many of the other regions
that could use English (eg AG AI AS AU AW BB BM BS BW BZ CA CC CK CM
CX DM ER FJ FK FM GB GD GH GI GM GY HK IE IN IO JM KE KI KN KY LC LR
LS MH MP MS MT MW NA NF NG NR NU NZ PG PH PK PN PW RW SB SG SH SL SZ
TC TK TO TT TZ UG UM US VC VG VI ZA ZM ZW).
In cases where content does differ according to the region, such as
the UK, then one includes overrides of what is in en.XML. (Where the
language-specific data for two locale/language tags are the same and
different than the base, one can be aliased (either in full or in
part) to the other. Thus if en_ZW, for example, followed UK spelling
conventions, then it could be aliased to en_UK. While the files use
"_", CLDR recognizes "-" and "_" as equivalent in identifiers.)
You say:
>But clearly there is no such thing as a region-neutral English locale
This sentence is a bit slippery; it depends highly on what one means
by locale. Let me recast it. For a given type of content (eg country
names) and a given language subtag, there may be differences among
regions (as defined by BCP47) or it may be that all regions share
the same values. (For that matter, there may be differences *within*
regions, as well -- either according to sub-region that BCP 47 isn't
fine-grained enough for (eg for some speech applications the
differences Bostonian English may be important).
Where there are differences in regions, the region is important.
Where there are not differences between regions, the region is not
important. Thus in many cases, the CLDR data does not differ by
country at all, so requiring a country subtag is pointless. In that
sense, I'd say your sentence
> that region is a key attribute of a locale,
is false. Region may or may not be significant, depending on the
content, and depending on the language.
If you meant to say that the *ability* to have a region as a
component of locale/language is key, then I'd agree with you --
otherwise one couldn't distinguish between en-US and en-UK content.
I do, however, agree with you on the major point: this is all about
*defaults*; identifiers have an inherent limitation -- they
represent some class of users, within which there will always be variations.
Mark
At 07:13 28/09/2006, Peter Constable wrote:
(Note: typography results from Peter Constable use of HTML in his post).
As I said earlier, this very much depends on oneâ??s notion of what
a locale is. You say,
â??â?¦ it depends highly on what one means by locale. Let me recast
it. For a given type of content (eg country names) and a given
language subtag, there may be differences among regions? or it may
be that all regions shhare the same values.â??
You are picking out one particular data category, country names.
That is not a locale, by any usage Iâ??ve ever seen before now! I
donâ??t in the slightest question that, for a single data category
for which the values are linguistic expressions, region is not
necessarily relevant. But again, that is not a locale.
You are casting â??localeâ?? as a data collection that is completely
variable wrt the data categories it contains, with no minimal set of
required data categories (thereâ??s only the proviso that there be
at least one kind of content). I can easily imagine thatâ??s a
useful approach to managing data in a repository like CLDR, where
the only functional requirement is data management. But a data
collection in that context is just that, a collection of data, not a
locale. A locale is a locale by virtue of its role within a software
implementation.
So, while I have no problem saying that a set of country names in
English is locale data, I would not say that makes it a locale. But,
of course, the way I am casting it leaves open the question of just
what the â??role within a software implementationâ?? needs to look like.
And, of course, Iâ??m assuming a model thatâ??s been around for a
while in which a software implementation has various functions to
produce various kinds of culture-dependent results -- provide a
country name, format a numeric value as a currency string, sort a
set of data, etc. ? where all of those functions havee in common a
particular parameter that uses one set of system-recognized symbols
to determine the culture to be assumed in producing any of those
results. In that model, I contend that region is always a key factor
in the cultural distinctions because there is always one or more
functions that produce results that are regionally determined or
even specific to a particular region: date format, default currency
symbol, etc.
And, of course, itâ??s possible to imagine an implementation that
doesnâ??t use that culture-atom model â?" i.e. an implementation in
which different sets of symbols are used for parameterizing
different clussters of functions. The whole set of functions still
have in common that they produce some kind of culture-dependent
result, but different ones use different parameters to determine
different cultural attributes as are relevant for the given
function. So, for instance, a function that formats a numeric date
value as a day name in a given language might use as a parameter
just a language ID with no region element, while another function
that formats a numeric value as a currency string might use as a
parameter just a region ID with no language element. Perhaps
software implementations in the future will all work this way such
that there are no longer any functions that rely on parameters that
correspond to â??locale IDsâ?? / LCIDs as those are understood in
that model described in the preceding paragraph. In that case, you
might well have a situation in which IDs with region elements are
needed only exceptionally ? as you are suggestion. But in that
casee, Iâ??d say that those identifiers that are used are IDs for
some other notions, not locale IDs.
Again, this is probably straying off topic for this list, so I
should let this one go.
Â
Peter
--- End Message ---
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- IETF-related spam from JFC Morfin,
Tony Finch <=
|
|
|