ietf-822
[Top] [All Lists]

10646 etc.

1993-02-06 18:25:49
... 10646 defines what a text element is: essentially,
a (base) character followed by an infinite sequence of combining marks.

The above model has a fatal defect in interactive environment. You can't
determine where symbols are separeted unless you read the next non-combining
mark. Thus, you can't display a symbol unless you look ahead an extra
character...
It should be noted for Henry Spencer that the above BUG was also pointed
out repeatedly by many people during the development phase of DIS 10646-1.2.

I don't know of anyone who is really happy with the "combining marks"
business; it's not really consistent with the "one glyph, one code"
philosophy of Unicode/10646/etc.  I'd predict that there will be a lot
of "10646" implementations that will quietly ignore it.

While unpleasant, I don't see that this is a fatal flaw from a pragmatic
viewpoint.  If you see Unicode as something that is supposed to be perfect
and last for all time, it's nasty.  If you see Unicode as a reasonable
compromise that will make the next decade rather less painful than it
would otherwise be, but will need revision or replacement eventually,
it's regrettable but not disastrous.

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry

<Prev in Thread] Current Thread [Next in Thread>