Daniel Heiserer <Daniel(_dot_)Heiserer(_at_)bmw(_dot_)de> writes:
Hi,
my experience in Unicode is very small, so my questions might look
stupid.....
I would like to use perl for a dicitionary. As different languages use
different
letters (even the european ones like spanish, german, etc.) I would like
to
use these.
Most western european languages are covered by the 8-bit encoding iso8859-1.
(The same one you use for German.)
Other 8-bit encodings iso8859-* etc. cover greek and languages needing
cyrillic.
But if you want to show phonetics then you need a wider repertoire.
(Needing phonetics was reason I started on the UTF-8 road myself...)
Using the terminal in linux/unix for input/output I guess that I would
need a
utf-8 enabled terminal, right?
Don't know for sure - terminals on Linux are new to me.
Assume I use a gui like perl-tk for input and output. How can i ensure
that
utf-8 is supported there?
The short answer is that no _released_ perl/Tk does UTF-8 yet.
I am working on it - but as most of the snags are in the perl part
I have been working on perl5.7.* rather than Tk803.*
But _ALL_ Tk's will support iso8859-1 for characters you need for
most (western) european languages. And can display text in other 8-bit
encodings if you tell it to use an appropriately encoded X font.
Do I need utf-8 fonts for perl-tk.
The Tcl/Tk code that does UTF-8 attempts to display character glyphs
by hunting through the available fonts looking for one that has
the glyph it needs. This works after a fashion. The snag is that
process can take a long time (10s of seconds on a 300MHz machine),
and often gets glyphs which don't match the "style" of the others in the
string.
So perl/Tk is likely to modify this to try iso10646-1 fonts before
(or instead of) doing that.
Do these exist?
There are not as far as I know any fonts encoded in UTF-8. There are
16-bit fonts encoded in iso10646-1 (which has same codepoints as Unicode).
From a perk/Tk perspective the distinction should not matter to user code
(it is up to Tk core to convert UTF-8 encoded stuff that perl gives it
to 16-bit font index).
Where can I get them?
See Markus's excellent intro:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
How can I input data then ( I only have a keyboard covering the latin
characters,
do I need a special keyboard driver?)?
Linux can use two schemes (compose key and dead keys) to input
characters in iso8859-1 or the local "locale" character encoding.
Can I cut and paste input and output?
iso8858-1 is fine, UTF-8 has a proposal which I will implement
in the UTF-8 aware perl/Tk - not other applications probably will
not support that yet.
thanks daniel
--
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.