Bruce Lilly wrote:
Andrew Gierth wrote:
since there have been no substantive changes made to UTF-8 _ever_
since its adoption as any sort of standard, why would you expect
future changes to introduce incompatibilities?
Bad premise; every time Unicode changes substantively, the utf-8
specification necessarily also changes since it is a Unicode to/from
octet stream transformation.
Why would you expect Unicode to change substantively?
The number of characters used for human communication doesn't seem to be
rising much, and there's plenty of space left in the current
specification. IIRC Unicode still uses less than 200,000 of the
million-odd possible code points.
I suppose you could argue that Unicode adds alphabets. But do you think
Unicode still hasn't reached the 20% mark?
When the Unicode consortium decides to include chicken scratching as
"characters" and extends the maximum width to beyond 32 bits, even
the 5- and 6-byte sequences (which exist in some "utf-8"
specifications on the octet-stream side, but not in others) will have
to change.
"Will have to change"... There's a little more than 5,000 languages on
the globe. If every one of them were to invent its own kanji-like
writing system with about 100,000 characters, that still wouldn't fill
32 bits.
--Arnt