Tim Kehres @ ima.com writes:
I have heard that in order to properly implement Unicode, all
implementations need to be able to always recognize and render all
incorporated character sets.
This is Hilarious! And definitely 100% wrong. It is so obviously
tripe cooked up by the competition to make Unicode look unattractive!
Unicode requires NO SUCH THING! DON'T YOU BELIEVE IT for a minute!!
Let's stamp out this horrifying rumor once and for all!
I'm relatively intimate with Unicode and feel reasonably qualified to
comment on this, so I'll take a crack at explaining...
Formal conformance to Unicode means that when you are supposed to be
passing things through, you are agree to pass through codes that you
don't understand without damaging them. There is a BIG difference
between agreeing to pass things along uninjured, and claiming to be
able to actually "Interpret" them in any way. ("Interpretation" of a
character means that the system or application understands the
character well enough to display it, sort it, or otherwise operate
upon it with the character's intended semantics.) A Unicode
implementation must be able to pass 16-bits through unharmed.
Whether or not it can Interpret those codes is a completely separate
issue. Wow! Why would anyone FORCE all implementations to carry
around all of the baggage for displaying all possible characters?
Some implementations may be able to interpret only Latin 1, or ASCII,
or even the single letter "a"! BUT, if they can pass through 16-bits
unharmed, and promise never to display GARBAGE when they don't
understand a character, voila! they're conformant. If an
implementation doesn't understand a particular character, as may
sometimes be the case, it can do any number of things, such as print
a little box or ring the bell. It's just prohibited from spitting
out random garbage AS IF it could interpret the characters.
So let's say I pass you a plaintext Unicode file that contains a
bunch of Bengali. You bring it up on your Unicode system, which has
fonts and facilities for only Latin 1 and Japanese. What do you see?
You might see little boxes where ever I have written a Bengali
character. The system does not PRETEND that it knows what it's
doing. Hence, you DO NOT SEE Japanese or Latin 1, you see your
system's unmistakable signal that it has encountered character codes
which it is unable to interpret for you. Sorry. You could fix the
problem maybe by purchasing a Bengali font or something, but hey, if
someone sends you Bengali once in a blue moon, and you can't read it
anyway, why should you pay good money for a Bengali font?
Regards,
Rick