perl-unicode

Re: intelligent lexically encoding

2005-09-08 01:43:36
On Sep 08, 2005, at 12:39 , Jerzy Giergiel wrote:
Neither of those fallbacks is OK, I want á converted to accent stripped version of itself i.e. a. The second solution isn't very helpful either, it's basically tr replacement table which is not much fun to write when majority of upper 128 characters need to be converted. There's gotta be a simpler and more elegant solution. thanks anyway.

Well, it's not that hard to write a tr version if you let perl do the job.

#!/usr/bin/perl
use strict;
use charnames qw(:full);
my ($from, $to);
for my $ord (0x80..0xff){
    my $chr = chr $ord;
    my $name = charnames::viacode($ord);
    $name =~ /(SMALL|CAPITAL) LETTER ([A-Z]) WITH/i or next;
    my $az = $1 eq 'CAPITAL' ? uc($2) : $2;
    $from .= $chr;
    $to   .= $az;
}
binmode STDOUT => ":utf8";
print qq(tr[$from]\n  [$to];), "\n";
__END__

And here is the output.

tr[ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ]
  [AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy];

In this kind of case, however, a simple tr/// won't cut it, however. Consider Schrödinger. Usually you spell that 'Schroedinger", not "Shrodinger". So you have to resort to s///g for most cases.

Dàñ thè Ëñçôdé Máìñtâíñêr

<Prev in Thread] Current Thread [Next in Thread>