perl-unicode

Re: What to do with non-assigned points?

2002-03-18 06:14:00
Nicholas Clark <nick(_at_)ccl4(_dot_)org> writes:
On Mon, Mar 18, 2002 at 08:01:55PM +0900, Dan Kogai wrote:

   That reminds me of this this question.  What is a (de jure|de facto)
standard for fallback character?  Is it up to each module?
   FYI  My humble Jcode uses "blank square" (aka Tofu) and MacOS X uses
single '?'.

links uses '*', which I find easier to read than '?'

(for all those *****y MSHTML pages that allege ISO-8859-1 and then use
MS sexed '', where the server should be reporting the page as Windows charset)

I believe that the fallback should be configurable on a per-something
basis (per charset?) which then leaves us debating what the default should
be.

The existing encoding mechanisms provide a fallback character on a per-encoding
basis for Unicode->xxxx direction. It is usually '?' for ASCII-oids.

What is not yet clear is how the API should enable that vs stopping vs ...

I agree that xxxx->Unicode should use U+FFFD - which is what it is for.

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/



<Prev in Thread] Current Thread [Next in Thread>