"charset" should provide all the profiling information to uniquely
map a byte stream to glyphs.
Thus, bare Unicode, which can't map some Devanagari and some Han correctly,
can't be a "charset".
Actually, this is true of a wide variety of things that you might think
One of them is ASCII! The ASCII standard is most carefully weasel-worded
so that it does *not* demand that you map bytes to glyphs in the way you
might think it required. This was to accommodate an obscure historical
problem: it was politically necessary that a 64-character subset of
ASCII -- codes 32 through 95 -- accommodate PL/I. This causes two
problems: ASCII has no "not" symbol, and the ASCII or-bar is not in
the 64-character subset. The result is a standard which is vague enough
about the appearance of the glyphs that it is legitimate to print code 41
(exclamation point) as or-bar and code 94 (circumflex) as not-sign.
This, incidentally, is the reason why the or-bar symbol in many fonts
has a small break in the middle: early revs of the ASCII standard showed
it that way in hopes of minimizing confusion with a code-41 or-bar.
(The break no longer appears in the standard, but a lot of hardware
suppliers haven't noticed.)
It is madness to interpret the definition of "charset" so narrowly that
the well-understood ASCII character set would not qualify.
An important distinction here is that between intent and realization.
If the gap between them is too wide, we just have to throw up our hands...
but if the realization departs from the intent in only small ways, it is
proper to class those as bugs (to be fixed eventually, one hopes) and
judge by the intent. I'm not up on the subtleties of the Devanagari/Han
problems with Unicode, but I strongly suspect that they qualify as bugs,
which we can legitimately overlook, rather than gross differences of
intent, which we can't. Unicode is *meant* to be a unique mapping, and
comes very close to being one.
Henry Spencer at U of Toronto Zoology