perl-unicode

Re: digits in iso-8859-6 to utf8 conversion

2003-05-21 03:30:05
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:
On Wednesday, May 21, 2003, at 03:21  PM, Jarkko Hietaniemi wrote:
I agree that there should be no reason do convert the ISO 8859-6
decimal digits to the Unicode Arabic-Indic digits.  I don't know where
this curious data might actually be coming from, since most of our
data comes from the Unicode Consortium's web site's legacy mapping
data, and there it's simple a business of 0x31 -> 0x00031.  Thanks for
the catch.

I wonder where they came from, too.  I think it slipped in when Encode 
switched from Tcl-based .enc to .ucm.  I will check ICU and regen 
ISO-8859-X and release Encode 1.95 as soon as I can because I consider 
this rather severe.

Hmm - Tcl/Tk might have done that deliberately as its primary use of 
encodings is for font glyph lookup. So it may have wanted to use the 
Arabic-Indic codepoints to find those glyphs.
(Though as Tk does not do BIDI or deal with Arabic initial/medial/final 
presentation forms the value of having arabic digit glyphs in a sea of 
mis-presented characters seems marginal.)
 

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/