Re: PS (Malformed UTF-8 character)

Thanks for your quick reply!

Looks like your theory about the input data being "in ascii (withentity
references...)" is contradicted by the evidence.


Indeed.


So now you need to determine what character encoding is being used for
the non-ascii codes, which are obviously present in the data.  When you
look at the file and you see a c with cedilla, can you tell whether is
this actually the appropriate character, based on its context?  Is this
true of all such characters?

I do not see a c with cedilla, I see a rhombus with a question markinside (which is the way my shell displays non-ASCII characters). Iguess it is a c with cedilla from the context.

So, I would like to ask you or anybody else: is there some kind of tool(e.g., a text editor) that I could use to discover which encoding isbeing used? (I tried with emacs but failed).


Thanks again.

Marco

---
Marco Baroni
University of Bologna
http://sslmit.unibo.it/~baroni

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Bidirectional (bidi) Support?, Jungshik Shin

Next by Date:

Re: Malformed UTF-8 character, John Delacour

Previous by Thread:

Re: PS (Malformed UTF-8 character), David Graff

Next by Thread:

Re: PS (Malformed UTF-8 character), Edward Cherlin

Indexes:

[Date] [Thread] [Top] [All Lists]