perl-unicode

Re: unicode and perl TK

2001-01-11 07:56:30
Daniel Heiserer <Daniel(_dot_)Heiserer(_at_)bmw(_dot_)de> writes:
Hi,
my experience in Unicode is very small, so my questions might look
stupid.....

I would like to use perl for a dicitionary. As different languages use
different
letters (even the european ones like spanish, german, etc.) I would like
to 
use these.

Most western european languages are covered by the 8-bit encoding iso8859-1.
(The same one you use for German.)
Other 8-bit encodings iso8859-* etc.  cover greek and languages needing 
cyrillic.

But if you want to show phonetics then you need a wider repertoire.
(Needing phonetics was reason I started on the UTF-8 road myself...)

Using the terminal in linux/unix for input/output I guess that I would
need a 
utf-8 enabled terminal, right?

Don't know for sure - terminals on Linux are new to me.


Assume I use a gui like perl-tk for input and output. How can i ensure
that
utf-8 is supported there? 

The short answer is that no _released_ perl/Tk does UTF-8 yet.
I am working on it - but as most of the snags are in the perl part
I have been working on perl5.7.* rather than Tk803.*

But _ALL_ Tk's will support iso8859-1 for characters you need for 
most (western) european languages. And can display text in other 8-bit 
encodings if you tell it to use an appropriately encoded X font.

Do I need utf-8 fonts for perl-tk. 

The Tcl/Tk code that does UTF-8 attempts to display character glyphs
by hunting through the available fonts looking for one that has 
the glyph it needs. This works after a fashion. The snag is that 
process can take a long time (10s of seconds on a 300MHz machine),
and often gets glyphs which don't match the "style" of the others in the
string.

So perl/Tk is likely to modify this to try iso10646-1 fonts before 
(or instead of) doing that.

Do these exist?

There are not as far as I know any fonts encoded in UTF-8. There are 
16-bit fonts encoded in iso10646-1 (which has same codepoints as Unicode).
From a perk/Tk perspective the distinction should not matter to user code
(it is up to Tk core to convert UTF-8 encoded stuff that perl gives it
to 16-bit font index).

Where can I get them? 

See Markus's excellent intro:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

How can I input data then ( I only have a keyboard covering the latin
characters,
do I need a special keyboard driver?)?

Linux can use two schemes (compose key and dead keys) to input 
characters in iso8859-1 or the local "locale" character encoding.

Can I cut and paste input and output?

iso8858-1 is fine, UTF-8 has a proposal which I will implement 
in the UTF-8 aware perl/Tk - not other applications probably will
not support that yet.


thanks daniel
-- 
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.

<Prev in Thread] Current Thread [Next in Thread>