Chris,
I think what you've done is very interesting, and
useful, so what I have to say below is not intended
as criticism of your work in any way.
It's curious that the Arabic Presentation Forms got
into Unicode at all, and a number of people still think
it was a mistake, a sell-out. One of the Fathers of Unicode
told me they were deprecated. Even the Unicode specification
explains their presence rather apologetically.
According to the True Gospel of Unicode, Arabic text
should always be _encoded_ in the basic \u06HH characters,
and should not need to be re-encoded. The rendering
(BIDI, shaping, ligatures) should be done in a separate
rendering engine that does whatever it needs to do internally
but does _not_ change the original encoding of the text.
The Unicode Presentation Forms are kludges for systems
that don't implement a proper rendering engine (and it
sounds like you are working within the limitations of
such a system). According to the True Gospel, the Arabic
Presentations Forms are an unjustified waste of code space.
From a purely aesthetic point of view, rendering each
basic Arabic character with one of only four glyphs (isolated,
initial, medial, final) yields only marginally acceptable
results. It's crude and butt ugly. Before you get defensive,
let me point out that I myself use a four-glyph-to-each-char font
for rendering Arabic on my own webpage www.arabic-morphology.com
So I'm not criticising you. My rendered Arabic is crude and
butt ugly. So I'm just saying that really good Arabic
rendering would need to be far more subtle and flexible than
any four-glyph font could allow. Good Arabic rendering also
needs to use a lot of ligatures, not just the laam-alif ones.
The Arabic Presentation Forms also provide a lot of ligatures,
but again a good rendering system would need ligatures not
includes in the closed set of Arabic Presentation Forms.
As an example, the ArabTeX rendering system (a package for
TeX and LaTeX for rendering Arabic) uses a font of about
250 basic glyphs, with an auxiliary mechanism to connect
the basic shapes where necessary with smooth curves.
The ultimate Arabic rendering engine is probably that of Thomas
Milo, who is inspired by the best of the Turkish calligraphers.
His rendering algorithms first draw the basic skeleton of the
word according to the best classical proportions, and then
fit in the dots and diacritics afterward (this is how real
calligraphers do it, and it requires some squeezing and subtle
contextual adjustments). A lot of people think that Milo's
work is _too_ calligraphic, and unsuitable for plain text
like newspapers and even most books, but he's a constant
reminder that most computer rendering of Arabic (including
my own) is still pretty crude.
Keep up the good work,
Ken
Mailing-List: contact perl-unicode-help(_at_)perl(_dot_)org; run by ezmlm
list-help: <mailto:perl-unicode-help(_at_)perl(_dot_)org>
list-unsubscribe: <mailto:perl-unicode-unsubscribe(_at_)perl(_dot_)org>
list-post: <mailto:perl-unicode(_at_)perl(_dot_)org>
Delivered-To: mailing list perl-unicode(_at_)perl(_dot_)org
Delivered-To: moderator for perl-unicode(_at_)perl(_dot_)org
To: perl-unicode(_at_)perl(_dot_)org
From: "Chris Whiting" <bb(_at_)yahoo(_dot_)com>
Subject: Re: Bidirectional (bidi) Support?
Date: Fri, 24 Oct 2003 22:24:38 -0400
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
X-Posted-By: 24.225.138.33
"Bob Hallissy" <Bob_Hallissy(_at_)sil(_dot_)org> wrote in message
news:OFA3E32D4C.D67B7CEA->ON80256DC7(_dot_)0038F9CD(_at_)notes(_dot_)sil(_dot_)org(_dot_)(_dot_)(_dot_)
On 21/10/2003 01:09:32 "Chris Whiting" wrote:
I have implemented ... an Arabic shaping algorithm in
Perl and was wondering if it would be useful to upload it to cpan.
I presume your algorithm depends on the Arabic presentation forms available
as separately encoded >characters in Unicode. If this is the case, and given
that lots of Arabic characters in Unicode do not have >all their
presentation forms separately encoded, nor will any new presentation forms
be added to the >standard, it would seem such an algorithm would be of
limited, and perhaps, misleading help.
The algorithm, and all that I have seen, convert Arabic characters in the
\x{06--} range to Arabic Presentation Forms A ( starting at \x{FB50} or B
( starting at \x{FE70} ) characters depending on their medial, isolated,
initial, and final values per the Unicode standard.
I am not sure that I understand your point. Isn't this the purpose of the
Arabic Presentation Forms?
I have found several different algorthms that do this but have found none on
CPAN. Note that I may help someone else (who has a simpler module than
mine) to upload his files.
I use the modules when rendering text on images using ImageMagick which
performs no shaping nor bidi algorithm. Without this shaping module many
of the characters are not rendered correctly. It my case the shaping module
is not limited nor misleading but required.
Perhaps, you have a better way?
Chris
Bob
**********************************************************************
Kenneth R. Beesley
ken(_dot_)beesley(_at_)xrce(_dot_)xerox(_dot_)com
Xerox Research Centre Europe Tel from France: 04 76 61 50 64
6, chemin de Maupertuis Tel from Abroad: +33 4 76 61 50 64
38240 MEYLAN Fax from France: 04 76 61 50 99
France Fax from Abroad: +33 4 76 61 50 99
XRCE page: http://www.xrce.xerox.com
Personal page: http://www.xrce.xerox.com/people/beesley/beesley.html
**********************************************************************