On Saturday, May 24, 2003, at 01:37 AM, David Graff wrote:
In the process, I chose to set up the digit-related entries in my
"iso-8859-6-nd.ucm" file as follows:
[snip]
The point here is that when Arabic text in Unicode happens to contain
Arabic-Indic digit characters, and we want to convert to iso-8859-6, it
would seem a good idea for these multi-byte digit characters to be
translated into their ASCII correlates, rather than being treated as
exceptions (replaced by "?", or throwing an error if encode's "CHECK"
flag is set to do that).
Personally I like your idea but I am rather too scared to tweak
ISO-8859-*. Another problem is that some people rather want to catch
whose as errors so I conclude we had better leave ISO-8859-*.
FYI you can say
$utf8_with_arabic =~ tr/\x{0660}-\x{0669}\x{06F0}-\x{06F9}/0-90-9/;
to achieve the same result.
Additionally you can use FB_HTMLCREF, and FB_XMLCREF so you can
preserve "0" and "٠" at the same time (handy if you need your
HTML/XML encoded in ISO-8859-6 but still want your arabic numerals).
Some people might object to this being a "default" behavior for
encoding
into 8859-6, but if it were available as an alternative, I think a lot
of people could find it useful. (Personally, I'd vote for this to be
the default behavior.)
My position toward mappings is to respect the standard as much as
possible. Well, it is more like I don't want to be held responsible
for the mappings. So if you want the default behavior of iso-8859-6,
you should convince Unicode Consortium or ISO so the official mapping
be updated.
On the other hand, you are free to distribute, say, Encode::Arabic that
implements exactly that.
Dan the Encode Maintainer