Thanks to all those who replied to my query about finding out the
encoding of a text that *should* have been in 7bit ASCII.
In the end, I dealt with the problem in the following way: I captured
the STDERR output of the script that gave the Malformed UTF-8 character
warnings and I processed it with a regexp to extract the lines where
the warnings were sent. Then, a colleague and I fixed all those lines
with a mix of hexdumping, text editing, deciding the right symbol on
the basis of the context, and simple scripts.
Regards,
Marco
---
Marco Baroni
University of Bologna
http://sslmit.unibo.it/~baroni