perl-unicode

Re: utf8, japanese, web-pages, the horror, the horror...

2004-05-08 01:30:07
Thanks for your advice... the output does look different, this time, but it still doesn't look like utf8... (I get the same error with recode).

If somebody could suggest a way to convert to another encoding, or a better way to identify the encoding of eac page, that would also be fine (once I have control over the encodings, I think I can find some way to convert back to utf8 (eg, via recode).

Thanks again,

Marco

On Saturday, May 8, 2004, at 05:16 Europe/Rome, Edward Batutis wrote:

Marco:

I think you are converting twice:

# output will be utf8
binmode(STDOUT, ":utf8");
...
                from_to($html_text,$charset,"utf8");
...

Here, it will convert html_text to utf-8 again because of binmode with
utf-8:

                print "CURRENT URL $url\n$html_text\n";

I think you can just remove the binmode line and it will work.

Why do encodings always cause so much pain?

I hope this helps today's pain, at least :-).

Regards,

=Ed