baroni(_at_)sslmit(_dot_)unibo(_dot_)it said:
I see a rhombus with a question mark inside (which is the way my
shell displays non-ASCII characters). I guess it is a c with cedilla
from the context.
So, I would like to ask you or anybody else: is there some kind of
tool (e.g., a text editor) that I could use to discover which
encoding is being used?
The first thing to do is get a hexadecimal dump of the data, to see what
the actual byte sequence is. The unix "od" utility is good for this,
and I think emacs has a mode for viewing data in hex. Once you see what
byte codes are being used to represent a c-cedilla (and/or other
non-ascii characters that are clearly inferable from context), you scan
through the various cross-mapping code tables that are available for
inspection or download at unicode.org (http://www.unicode.org/Public/
MAPPINGS/).
As a clue, if you see a two-byte sequence for each accented character,
whereas the plain-ascii characters are all single-byte, then the data
is probably in utf8 (another clue for this is that the first byte of
each multi-byte character will always have the same value for a given
language).
On the other hand, if all characters appear to be single-byte, you'll
need to look for the name of the inferred character (e.g. "LATIN SMALL
LETTER C WITH CEDILLA") in the various cross-mapping tables, and
determine which table has the appropriate byte code that matches your
data for this character.
Dave G.