perl-unicode

Re: PS (Malformed UTF-8 character)

2003-10-26 01:30:05
Thanks for your quick reply!

Looks like your theory about the input data being "in ascii (with entity
references...)" is contradicted by the evidence.

Indeed.

So now you need to determine what character encoding is being used for
the non-ascii codes, which are obviously present in the data.  When you
look at the file and you see a c with cedilla, can you tell whether is
this actually the appropriate character, based on its context?  Is this
true of all such characters?

I do not see a c with cedilla, I see a rhombus with a question mark inside (which is the way my shell displays non-ASCII characters). I guess it is a c with cedilla from the context.

So, I would like to ask you or anybody else: is there some kind of tool (e.g., a text editor) that I could use to discover which encoding is being used? (I tried with emacs but failed).

Thanks again.

Marco



---
Marco Baroni
University of Bologna
http://sslmit.unibo.it/~baroni

<Prev in Thread] Current Thread [Next in Thread>