perl-unicode

Re: Handling MacArabic in perl 5.8.0

2003-01-28 00:30:04

That is great -- THANK YOU!!  For general reference, I applied the
"decodeMacArabic" module to my test script as shown below, and it works
as desired.  The last few lines of the main "for" loop were added to
confirm that the module properly minimizes the use of "LRO ... PDF" and
"RLO ... PDF" directional controls for contiguous strings of affected 
characters -- a tremendous benefit!  (Note that I'm not working on a 
Mac -- I'm using solaris here, and I just need to cope with data that 
comes from a Mac.)

BTW, I just noticed that the unicode web site now has a more recent 
version of the APPLE/ARABIC.TXT mapping page than the one I cited 
earlier, and the new version offers improved/expanded commentary:
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ARABIC.TXT
(dated Dec. 19, 2002).

--- enhanced "test-encode.perl", using Tomoyuki's Lingua::MacArabic ---

use strict;
use Encode;
use Lingua::AR::MacArabic;

my $utf8_out;
my @octet_in;

push @octet_in, chr($_) for ( 0x20 .. 0x7e, 0x80 .. 0xff );

for my $table ( qw/cp1256 MacArabic/ )
{
    my @succ = ();
    my @fail = ();
    my @msgs = ();
    for ( @octet_in )
    {
        my $char = $_;
        if ( $table eq 'cp1256' ) {
            eval "\$utf8_out = decode( \'$table\', \$char, Encode::FB_CROAK )";
        } else {
            eval "\$utf8_out = decodeMacArabic( \$char )";
        }
        if ( $@ ) {
            push @fail, sprintf( " %2.2x\n", ord( $_ ));
            push @msgs, $@;
        }
        else {
            my $uhex = join( ' ', map { sprintf( "%4.4x", ord( $_ )) }
                             split( //, $utf8_out ));
            push @succ, sprintf( " %2.2x => %s\n", ord( $_ ), $uhex );
        }
    }
    print "decode via $table succeeded on ", scalar @succ, " codes\n";
    print join '', @succ if ( @succ );
    print "decode via $table failed on ", scalar @fail, " codes\n";
    print join '', @fail if ( @fail );

    my $bigstring = join( '', @octet_in );
    my $ubigstring;
    if ( $table eq 'cp1256' ) {
        $ubigstring = decode( $table, $bigstring );
    } else {
        $ubigstring = decodeMacArabic( $bigstring );
    }
    my $uhex = join( ' ', map { sprintf( "%4.4x", ord( $_ )) } 
                     split( //, $ubigstring ));
    print $uhex, "\n";
}


<Prev in Thread] Current Thread [Next in Thread>