ietf-822
[Top] [All Lists]

Re: All these lonely accents, where do they all come from?

2002-05-07 19:43:01

Text doesn't appear out of thin air, Keith.

thank you for that brilliant insight!
 
Please explain exactly where you think unnormalized UTF-8 text enters
the world's computer systems.

well for starters, how about the reason that Unicode has multiple 
representations for characters in the first place - because they wanted 
to support invertable translation to and from legacy character sets without 
information loss, and also because they combined several character sets 
which had some non-null intersections but there were cases where it made 
sense to assign different codepoints.  

then there are existing editors and keyboard drivers that don't generate 
normalized text.  for all I know, it may not even be desirable to do so  -
there may be some languages for which this would be seen as a defect.
I'm not going to second-guess the designers of charsets for other languages 
(or Unicode for that matter) and say that all text in those charsets should 
always be normalized.

as for why those legacy charsets supported combining characters - you'll 
have to ask the designers of those charsets. 

Then explain why you think normalization should be put into every
program that _handles_ text, instead of the much smaller set of programs
(notably, keyboard interfaces) that _produce_ text.

I didn't say that.  I just said that putting the normalizations in the
set of programs that produce text was beyond our ability to influence,
and that it wouldn't be sufficient anyway.

Keith 

p.s. as far as I know this one is a new topic for us.  
want to write up a web page on it?

<Prev in Thread] Current Thread [Next in Thread>