Re: intelligent lexically encoding

On Sep 08, 2005, at 12:39 , Jerzy Giergiel wrote:

Neither of those fallbacks is OK, I want á converted to accentstripped version of itself i.e. a. The second solution isn't veryhelpful either, it's basically tr replacement table which is notmuch fun to write when majority of upper 128 characters need to beconverted. There's gotta be a simpler and more elegant solution.thanks anyway.

Well, it's not that hard to write a tr version if you let perl do thejob.


#!/usr/bin/perl
use strict;
use charnames qw(:full);
my ($from, $to);
for my $ord (0x80..0xff){
    my $chr = chr $ord;
    my $name = charnames::viacode($ord);
    $name =~ /(SMALL|CAPITAL) LETTER ([A-Z]) WITH/i or next;
    my $az = $1 eq 'CAPITAL' ? uc($2) : $2;
    $from .= $chr;
    $to   .= $az;
}
binmode STDOUT => ":utf8";
print qq(tr[$from]\n  [$to];), "\n";
__END__

And here is the output.

tr[ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ]
  [AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy];

In this kind of case, however, a simple tr/// won't cut it, however.Consider Schrödinger. Usually you spell that 'Schroedinger", not"Shrodinger". So you have to resort to s///g for most cases.


Dàñ thè Ëñçôdé Máìñtâíñêr

<Prev in Thread]	Current Thread	[Next in Thread>
intelligent lexically encoding, Jerzy Giergiel Re: intelligent lexically encoding, Dan Kogai Re: intelligent lexically encoding, Jerzy Giergiel Re: intelligent lexically encoding, Andreas J. Koenig Re: intelligent lexically encoding, Dan Kogai <=