Re: digits in iso-8859-6 to utf8 conversion

Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:

On Saturday, May 24, 2003, at 01:37  AM, David Graff wrote:

In the process, I chose to set up the digit-related entries in my
"iso-8859-6-nd.ucm" file as follows:

[snip]

The point here is that when Arabic text in Unicode happens to contain
Arabic-Indic digit characters, and we want to convert to iso-8859-6, it
would seem a good idea for these multi-byte digit characters to be
translated into their ASCII correlates, rather than being treated as
exceptions (replaced by "?", or throwing an error if encode's "CHECK"
flag is set to do that).


Personally I like your idea but I am rather too scared to tweak 
ISO-8859-*.  Another problem is that some people rather want to catch 
whose as errors so I conclude we had better leave ISO-8859-*.


I like the idea too - it is in perl's spirit of being liberal 
in what it accepts and pragmatic.

There is scope for several of these "fallback" enhancements - 
in particular Windows has a habit of calling its variant of latin1 
"iso-8859-1" and when it does so "smart quotes" and m-dash in particular
get rendered as U+FFFD replacement char.

Given that these fallbacks are marked as such we could have a strictness 
pragma for the encoding which enabled them...


My position toward mappings is to respect the standard as much as 
possible.  Well, it is more like I don't want to be held responsible 
for the mappings.  So if you want the default behavior of iso-8859-6, 
you should convince Unicode Consortium or ISO so the official mapping 
be updated.

On the other hand, you are free to distribute, say, Encode::Arabic that 
implements exactly that.


So maybe that is what we should do - leave standard encodings as they are
and have some "lax" encodings of our own.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/