ietf-822
[Top] [All Lists]

Re: All these lonely accents, where do they all come from?

2002-05-09 04:14:00

In <200205080520(_dot_)g485Kkg28063(_at_)astro(_dot_)cs(_dot_)utk(_dot_)edu> 
Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:


the reason that Unicode has multiple representations for characters in
the first place - because they wanted to support invertable
translation to and from legacy character sets without information loss

Nonsense. One can't convert UTF-8 to ISO 8859-1, for example, without
information loss.

I suppose I should have been more precise, but I thought it would be
obvious. the criteria was to be able to translate from a legacy charset
to Unicode and back to the original charset without lossage. 

No, that is NOT the required property.

You start with text A in some horrible legacy charset C (which knows nothing
of normalization and embodies horrendously redundant notations). You
convert it into Unicode U(A) and normalize it N(U(A)) (because converters,
like keyboards, are one of the small set of programns that MUST
normalize). Then you convert it back into the legacy charset C(N(U(A))).

You cannot expect
        A = C(N(U(A)))
(though you might just be lucky, and it would likely stabilize after one
or more further cycles round the loop).

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

<Prev in Thread] Current Thread [Next in Thread>