perl-unicode

Re: Handling MacArabic in perl 5.8.0

2003-01-26 06:30:05

I understand that Mac developers would consider a conversion to unicode
"lossy" or "non-reversible" if the directionality indicators are not
preserved somehow (using RLE/LRE or RLO/LRO), and this might constitute
an "algorithmic" approach that 'enc2xs' would not support.

Is there a work-around that will allow all the MacArabic code points to
be converted successfully, given that their respective character
semantics are all well established in unicode?  Even a "lossy" 
conversion (ditching the directionality specs) would be better than the 
failures I'm getting now.

(1) If you can forgive information loss on the text direction,
how about use of fallback?

e.g.

0x2B    <LR>+0x002B   # PLUS SIGN, left-right
0xAB    <RL>+0x002B   # PLUS SIGN, right-left

in http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ARABIC.TXT
can be converted to

<U002B> \x2B |0 # PLUS SIGN
<U002B> \xAB |3 # PLUS SIGN, right-left

in Encode/ucm/macArabic.ucm.

(2) I've briefly written a module (attached with this mail)
for MacArabic with Perl 5.6.1 or later.

I hope it would be able to be built on Mac;
but I haven't worked with Macintosh, and
I'm not well-informed in Macintosh nor "bidi",
please report me if something wrong.
(at least, the version here doesn't support
embedding or nesting of direction.)

SADAHIRO Tomoyuki

Attachment: Lingua-AR-MacArabic-0.00.tar.gz
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>