perl-unicode

Re: digits in iso-8859-6 to utf8 conversion

2003-05-23 11:30:06
On Saturday, May 24, 2003, at 01:37  AM, David Graff wrote:
In the process, I chose to set up the digit-related entries in my
"iso-8859-6-nd.ucm" file as follows:

[snip]

The point here is that when Arabic text in Unicode happens to contain
Arabic-Indic digit characters, and we want to convert to iso-8859-6, it
would seem a good idea for these multi-byte digit characters to be
translated into their ASCII correlates, rather than being treated as
exceptions (replaced by "?", or throwing an error if encode's "CHECK"
flag is set to do that).

Personally I like your idea but I am rather too scared to tweak ISO-8859-*. Another problem is that some people rather want to catch whose as errors so I conclude we had better leave ISO-8859-*.

FYI you can say

$utf8_with_arabic =~ tr/\x{0660}-\x{0669}\x{06F0}-\x{06F9}/0-90-9/;

to achieve the same result.
Additionally you can use FB_HTMLCREF, and FB_XMLCREF so you can preserve "0" and "&#x0660" at the same time (handy if you need your HTML/XML encoded in ISO-8859-6 but still want your arabic numerals).

Some people might object to this being a "default" behavior for encoding
into 8859-6, but if it were available as an alternative, I think a lot
of people could find it useful.  (Personally, I'd vote for this to be
the default behavior.)

My position toward mappings is to respect the standard as much as possible. Well, it is more like I don't want to be held responsible for the mappings. So if you want the default behavior of iso-8859-6, you should convince Unicode Consortium or ISO so the official mapping be updated.

On the other hand, you are free to distribute, say, Encode::Arabic that implements exactly that.

Dan the Encode Maintainer