Keith 7-bits-forever Moore writes:
the reason that Unicode has multiple representations for characters in
the first place - because they wanted to support invertable
translation to and from legacy character sets without information loss
Nonsense. One can't convert UTF-8 to ISO 8859-1, for example, without
information loss.
If you're trying to point to some actual feature of Unicode, give a
complete quote and a precise reference---and then explain how this
justifies your claims about unnormalized text entering the system.
then there are existing editors and keyboard drivers that don't generate
normalized text.
Name them, Keith.
for all I know, it may not even be desirable to do so -
there may be some languages for which this would be seen as a defect.
Do you understand what normalization is, Keith?
Then explain why you think normalization should be put into every
program that _handles_ text, instead of the much smaller set of programs
(notably, keyboard interfaces) that _produce_ text.
I didn't say that.
Then why are you demanding that thousands of Internet programs acquire
Unicode normalization features?
---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago