Re: UTF-8 Terminals

Jarkko Hietaniemi wrote on 2001-11-09 22:02 UTC:

I think that "displaying UTF-8 text" is quite a difficult task.  Not
only would you need a really large font -- both a number of glyphs (or
an ingenious font switching scheme), and to support the most intricate
CJK glyphs I hear that at least a 20pt font is required.


Hardly anyone needs full Unicode. If all you are interested in are
European scripts and symbols for instance, then the 3 kilocharacters of
the Unicode subset MES-3 are more than good enough for your needs, and
the XFree86 standard xterm fonts 6x13, 8x13, 9x15, 9x18, 10x20 have
covered MES-3 for over a year now and are widely used.

People who can read CJK glyphs have used larger font sizes so far and
will continue to do so in the future. Font size has nothing whatsoever
to do with the encoding. It would be silly to decide (as Netscape 4 did
:-( ) that every Unicode font has to be large enough to be able to
represent every script covered by Unicode. On the contrary: XFree86
ships with 4x6 and 5x8 pixel Unicode fonts, and people do use these with
xterm. For the 6x13 standard xterm "fixed" font, we even have a 12x13
doublewidth Japanese supplement that quite a number of Japanese users of
800x600 laptops have found very useful, in spite of its for CJK
typographic needs a too small resolution (and Chinese and Korean users
regularly send me questions for when 12x13 will be extended to cover
their glyph repertoires as well). XFree86 also has a 9x18/18x18 terminal
font with good CJK coverage, and I have no doubt that others will provide
eventually even larger and nicer terminal fonts.

Moreover, the old way of thinking "one codepoint, one box" isn't going
to work with combining characters (and keeping on piling the combining
characters pushes the capabilities of the font rendering).  Don't forget
ligatures, and I do not mean only the Latin ones: think Arabic, or Indic.


The xterm shipping with XFree86 has supported a simple form of combining
characters (in particularly motivated by Thai/Maths/IPA requirements)
for over a year. This stuff is admittedly a bit more experimental, as
not all UTF-8 aware command-line tools are also handling combining
characters perfectly, but there is at least a growing Thai community
enjoying the xterm support for simple overstriking combining characters.
There are also at least two terminal editor in wide use now that support
combining characters under xterm: vim 6.0 (the commonly used vi clone)
and mined.

VT100-style UTF-8 terminal emulators will for the foreseeable future
not have full and well-established support for Hebrew, Arabic, Syriac,
and Indic, because the bidi and ligature substitution requirements
clsh significantly with the simple typewriter rendering model of
a VT100. Hebrew and Arabic are about doable and there are experimental
implementations by e.g. Robert Brady and others, but Indic and Syriac
have not even been seriously discussed.

Mind, I would be (plesantly) surprised if there really is a 'terminal'
that can justice to the intricacies of Unicode.  At the time the Plan
9's 9term probably was close, but Unicode has moved on since.  On an
xterm, sure, you can have the fonts, but probably not the combining
characters.  Yudit, ditto.


You obviously haven't used xterm recently in a UTF-8 locale. Look at the
attached UTF-8 file with "vim 6.0" or "cat 0.94c" or newer
in a UTF-8 locale!

For an update:

  http://www.cl.cam.ac.uk/~mgk25/unicode.html#xterm

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

sg1.txt
Description: sg1.txt