Re: digits in iso-8859-6 to utf8 conversion

On Saturday, May 24, 2003, at 01:37  AM, David Graff wrote:

In the process, I chose to set up the digit-related entries in my
"iso-8859-6-nd.ucm" file as follows:

[snip]

The point here is that when Arabic text in Unicode happens to contain
Arabic-Indic digit characters, and we want to convert to iso-8859-6, it
would seem a good idea for these multi-byte digit characters to be
translated into their ASCII correlates, rather than being treated as
exceptions (replaced by "?", or throwing an error if encode's "CHECK"
flag is set to do that).

Personally I like your idea but I am rather too scared to tweakISO-8859-*. Another problem is that some people rather want to catchwhose as errors so I conclude we had better leave ISO-8859-*.


FYI you can say

$utf8_with_arabic =~ tr/\x{0660}-\x{0669}\x{06F0}-\x{06F9}/0-90-9/;

to achieve the same result.

Additionally you can use FB_HTMLCREF, and FB_XMLCREF so you canpreserve "0" and "&#x0660" at the same time (handy if you need yourHTML/XML encoded in ISO-8859-6 but still want your arabic numerals).

Some people might object to this being a "default" behavior forencoding

into 8859-6, but if it were available as an alternative, I think a lot
of people could find it useful.  (Personally, I'd vote for this to be
the default behavior.)

My position toward mappings is to respect the standard as much aspossible. Well, it is more like I don't want to be held responsiblefor the mappings. So if you want the default behavior of iso-8859-6,you should convince Unicode Consortium or ISO so the official mappingbe updated.

On the other hand, you are free to distribute, say, Encode::Arabic thatimplements exactly that.


Dan the Encode Maintainer