Text doesn't appear out of thin air, Keith.
thank you for that brilliant insight!
Please explain exactly where you think unnormalized UTF-8 text enters
the world's computer systems.
well for starters, how about the reason that Unicode has multiple
representations for characters in the first place - because they wanted
to support invertable translation to and from legacy character sets without
information loss, and also because they combined several character sets
which had some non-null intersections but there were cases where it made
sense to assign different codepoints.
then there are existing editors and keyboard drivers that don't generate
normalized text. for all I know, it may not even be desirable to do so -
there may be some languages for which this would be seen as a defect.
I'm not going to second-guess the designers of charsets for other languages
(or Unicode for that matter) and say that all text in those charsets should
always be normalized.
as for why those legacy charsets supported combining characters - you'll
have to ask the designers of those charsets.
Then explain why you think normalization should be put into every
program that _handles_ text, instead of the much smaller set of programs
(notably, keyboard interfaces) that _produce_ text.
I didn't say that. I just said that putting the normalizations in the
set of programs that produce text was beyond our ability to influence,
and that it wouldn't be sufficient anyway.
Keith
p.s. as far as I know this one is a new topic for us.
want to write up a web page on it?