Re: Troubles with UTF-8

At 13:44 23/12/2005, Masataka Ohta wrote:

Tom.Petch wrote:
> Overall, my perception is that we have the political statement -UTF-8 will be
> used - but have not yet worked out all the engineering ramifications.

Correct. Like so many results of IETF, enforcing Unicode just does
not work.

Amen. This is an architectural feature decided for political reasonswhich does not scale.

But, never mind. Unicode has nothing to do with the internationalization.

I beg to differ on wording. Internationalization is an IETF/Unicodeword. It is part of the equation "globalization=global environmentinternationalization + local environment localization". Its IBMunderstanding is to reduce the lingual barrier between the core andthe ends it relates with. I think it is appropriate to the IETFUS-ASCII based Internet technology.

But the real world is "multinationalization" (if to keep the sameimage, or multilingualization): the same but for every end to endrelation (and languages). Let consider the IETF RFC 2277 proposition:content must be in Unicode (client system) and the protocol is inUS-ASCII (core system). A document may look being in a language, butwhen you read its source it is in English interspread with unicoded text.

The internationalization (RFC 3066bis) culture is unilateral.Networking calls for a multilateral culture architecture (RFC 4151 may help).

The only solution I see, which addresses the requirements of TomPetch, is to go through a common universalisation layer (not charsetdependent), accepting the existing US-ASCII environment of MasatakaOhta as a maximum. It should then down to Hexa. Getting rid of theUnicode based layer violations, and permitting a full charset supportstrategy where Unicode could fully play its role of common reference.

Obviously two-tier policies based on langtags could not develop aseasily as planned.

jfc

> others to
> 0000-00FF, essentially Latin-1, which suits many Western languages but
> is not truly international.

The only appropriate subset of Unicode is 0000-007f, ASCII. Latin-1,
which introduced the confusions of the currency symbol and NBSP, is
already overkill.

> Unicode lacks a no-op, a meaningless octet,

The confusion of NBSP implies that spaces are not so meaningful
octets so that it may be replaced by line break characters.

So, the situation is worse than you would have considered and even
full Latin-1 is hopeless.

Just interpret UTF-8 ASCII.

                                                        Masataka Ohta


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf