Re: utf8, japanese, web-pages, the horror, the horror...
2004-05-08 01:30:07
Thanks for your advice... the output does look different, this time,
but it still doesn't look like utf8... (I get the same error with
recode).
If somebody could suggest a way to convert to another encoding, or a
better way to identify the encoding of eac page, that would also be
fine (once I have control over the encodings, I think I can find some
way to convert back to utf8 (eg, via recode).
Thanks again,
Marco
On Saturday, May 8, 2004, at 05:16 Europe/Rome, Edward Batutis wrote:
Marco:
I think you are converting twice:
# output will be utf8
binmode(STDOUT, ":utf8");
...
from_to($html_text,$charset,"utf8");
...
Here, it will convert html_text to utf-8 again because of binmode with
utf-8:
print "CURRENT URL $url\n$html_text\n";
I think you can just remove the binmode line and it will work.
Why do encodings always cause so much pain?
I hope this helps today's pain, at least :-).
Regards,
=Ed
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- utf8, japanese, web-pages, the horror, the horror..., Marco Baroni
- RE: utf8, japanese, web-pages, the horror, the horror..., Edward Batutis
- Re: utf8, japanese, web-pages, the horror, the horror...,
Marco Baroni <=
- Re: utf8, japanese, web-pages, the horror, the horror..., Nick Ing-Simmons
- Re: utf8, japanese, web-pages, the horror, the horror..., Marco Baroni
- utf8, japanese, web-pages: beginning to see the light..., Marco Baroni
- Re: utf8, japanese, web-pages: beginning to see the light..., Nick Ing-Simmons
|
Previous by Date: |
RE: utf8, japanese, web-pages, the horror, the horror..., Edward Batutis |
Next by Date: |
Re: Printing Unicode from XS, Nick Ing-Simmons |
Previous by Thread: |
RE: utf8, japanese, web-pages, the horror, the horror..., Edward Batutis |
Next by Thread: |
Re: utf8, japanese, web-pages, the horror, the horror..., Nick Ing-Simmons |
Indexes: |
[Date]
[Thread]
[Top]
[All Lists] |
|
|