ietf-822
[Top] [All Lists]

Re: 10646, and all that

1993-03-05 06:38:28
I agree except for the multilanguage qualification.  If Text/plain is going to
allow Charset=10646, then (as a user) I would expect complete polylingual
transmision to be completly transmisible, and in that lies the rub (for me as 
an
implementor).

I think this is a reasonable position, but at the wrong place and the
wrong time.  If you have these expectations, then "you", and all other
users with them, should have insisted on enough specificity in 10646
that this was feasible, without additional qualification, profiling,
markup, etc.  In this regard, DIS-1 was better than DIS-2, but suffered
from other defects.  But, whether by silence or failures in the
representation process to JTC1, the position advocating that level of
specificity failed.  Gone.  Done for.   And it is really too late right
now to make 10646 something that it isn't.  Maybe in the next revision.

So the expectation is unreasonable because JTC1/SC2 didn't successfully
do what you would have liked them to do.   Life is hard.

  Anyone for Content-Language: ? 

It could help my heuristic proposal, but the natural interpretation would be
mono-lingual.
   Unless we permitted a list, yes.  And such a list could not include
languages whose representations had been collapsed in severe ways in
10646.  Again, life is hard.

    And I don't think that your heuristics are going to work in the
C/J/K case anyway.  If I send you real multilingual text that contains
mostly Japanese Han (e.g., when you go looking, you find some kana and
thereby conclude "Japanese") but some embedded Chinese text, you aren't
going to get the latter right.  Only solutions I can see all involve
explicit sender marking of the inserted Chinese Han.  We can debate the
kind of "marking", but...

    I think there is one other confusion here, so let me restate
something that others have tried to say in other ways.  We are really
not talking about "fonts" here.  A font provides an extremely specific
way of representing each character-symbol contained in it.  Fonts are
specific outputs of a creative/design process, and most of them are
protected by copyright or worse.  The Roman character "A" is different
in Times Roman (tm) and Century Schoolbook (tm) even though it takes a
moderately sophisticated observer who is paying attention to tell the
difference.
    We don't want that level of specificity in plain text.  If nothing
else, it would lead to a lot of rules about when font specification
could or could not be ignored, and the risk of having messages rejected
because the receiving site didn't have the right font licenses or
libraries.  Perfectly ok in application/fancy-markup or
application/page-description-language as far as I'm concerned, but not
in plain text.
    That takes us back to what the ISO character sets do define.  They
map a code point to a character abstraction, such that code position
0041 (10646, UCS-2, but these comments apply to everything from ASCII
up) can be displayed as either "A" in Times Roman, or "A" in Century
Schoolbook or "A" in nondescript-terminal-ASCII.  Displaying it as "C"
in any of those fonts would be at least a bizarre interpretation of the
standard and clear violation of intent, and probably non-conforming.
     Now the issue we are fighting with is whether the mappings of 10646
provide sufficiently precise character abstractions that the results can
be rendered into displayed characters (in some choice of font) without
loss of information.  For Western European languages, the answer is
pretty much "yes, it does".  For Asian languages, the answer is, at
best, more complicated, and we can clearly construct examples of where
it does not.  This is still not a "font" issue, it is an issue of
whether a particular code point is displayed in an acceptable (any
acceptable) font for Japanese or whether it is displayed in an
acceptable font for Chinese or Korean, given that those font choices are
largely disjoint.

    --john

<Prev in Thread] Current Thread [Next in Thread>