ietf-822
[Top] [All Lists]

Re: 10646, and all that

1993-03-05 18:38:22
      I think there is one other confusion here, so let me
  restate something that others have tried to say in other
  ways.  We are really not talking about "fonts" here.

exactly.

Thank you for putting into the words which failed me.

 If Text/plain is going to allow Charset=10646, then 
 (as a user) I would expect complete polylingual
 transmision to be completly transmisible,

  I think this is a reasonable position, but at the wrong
  place and the wrong time. ....
  So the expectation is unreasonable because JTC1/SC2
  didn't successfully do what you would have liked them to
  do.   Life is hard.

indeed, I was blissfully ignorant of all the committees you mention, I probably
would have tried to participate had I been aware and ether-conected, but didnt.
Still, that doesnt address the problem of how we are going to cope with 10646
plain text transmission, users will want some mechanism, and so it must be
sought.

  And such a list could not include languages whose
  representations had been collapsed in severe ways in
  10646.

but that is precisely where it would proove most usefull.  If it was posible to
know that only Kanji was sent, then no ambiguity is possible.  Poly J/K would
also benefit from knowledge of the absence of C, so it would even be usefull to
allow declaration of that in the list.

      And I don't think that your heuristics are going to
  work in the C/J/K case anyway.  If I send you real
  multilingual text that contains mostly Japanese Han
  (e.g., when you go looking, you find some kana and
  thereby conclude "Japanese") but some embedded Chinese
  text, you aren't going to get the latter right.

could well be, I was hoping to avoid delving *deeply* into natural language
parsing (ugh, shudder), but had hoped that enough information could be gleaned
from quotation marks and other puntuation to be usefull.

Of course, if we can actually demonstrate that no heuristic is likely to be
satisfactory, then that will allow us to move on to consider more-intrusive
solutions, possibly 

   charset=cjk-tagged-10646.

Of course the principle drawback to this is the requirement that the sender
employ it, but since we are early on the curve of 10646 usage, especially in
email, I would hope that we could have some expectation that a well designed
scheme would be accepted.

Since I have no access to 10646, and must borrow Unicode from a friend, I must
depend on others for the details.
--
dana s emery <de19(_at_)umail(_dot_)umd(_dot_)edu>


<Prev in Thread] Current Thread [Next in Thread>