G'day Unicode Gurus and other assorted members of the perl Unicode
community.
I have a script that attempts to collect translations from Babelfish.
I've posted it below.
It uses LWP::Useragent to turn an English phrase into Japanese (or any
other language supported by BabelFish)*
However, once I get the translation out of the page it appears to be
full of null bytes. I've tried various things like Unicode::String or
Encode, but to no avail.
The script below just does the grab-and-extract. No unicode stuff.
Please tell me what I should be doing at what point to be able to
extract the correct information.
* Please note: I'm not expecting a great translation so don't bother
pointing out that german for "Report a bug" is "Tell about a cockroach".
I just need something that I can use until a translator has done a real
translation.
#!/usr/bin/perl
use URI::Escape;
require LWP::UserAgent;
my $escape = uri_escape(join(' ', @ARGV));
my $ua = LWP::UserAgent->new;
my $response =
$ua->get("http://babelfish.altavista.com/tr?trtext=$escape&lp=en_ja");
if ($response->is_success) {
$result = $response->content;
} else {
die $response->status_line;
}
my ($translation) = $result =~ /\Q<td bgcolor=white class=s><div
style=padding:10px;>\E(.+?)\Q<\/div>\E/;
print $translation ."\n"
. length($translation) ."\n"
. ord(substr($translation,0,1));
__END__