perl-unicode

Re: Unicode::Map

1999-11-17 08:56:02

    Bruce> I'm still not sure that FreeType is fully handling some of the TTF
    Bruce> Unicode.  I know that ttf2bdf for platform 3, encoding 1
    Bruce> (Windows/Unicode) is generating some grungy fonts at times, which
    Bruce> has made me cautious about installing a FreeType enabled
    Bruce> fontserver.  I'm still not sure of Mac/Unicode in-font tabling, but
    Bruce> most Fontographer fonts floating around don't have that.

There was a recent version that had some fairly serious rendering bugs.  That
has been fixed for at least three months now, and a new version is going to be
released in the near future.  The most recent distribution is always available
on http://www.freetype.org.

    Bruce> Also, with some (e.g. indic) languages, a common entry method is
    Bruce> romanized syllables.  This type of encoding should also be parsable
    Bruce> for autoconversion to Unicode, as it is to the many encodings
    Bruce> supported by "itrans", and its successor (in development)
    Bruce> "iscript".

This would be *very* nice to have, but it will take a sophisticated system to
handle the wide variety of encodings out there.  A good example is my
converter from Naidunia Devanagari to UCS-2, written in Perl:

   http://crl.nmsu.edu/~mleisher/nai.html

This sort of thing pretty much takes a programming language and a bit of a
priori knowledge about the documents being converted.  Other encodings such as
VIQRI (Vietnamese), the old N-byte Hangul, and the various Arabic and Persian
font encodings need their own kind of special handling as well.

    Bruce> At the same time, TSCII is growing pretty fast for Malaysia, but
    Bruce> "TAB" encoding has been endorsed by the Maylay government as the
    Bruce> standard Tamil encoding - RATHER than the assigned Unicode block.
    Bruce> I haven't looked (yet) to see if this is merely a truncation to the
    Bruce> low-order byte of the Unicode or what.

I haven't heard about these in a while.  Do you have pointers you can send?
Also, I lost all my pointers to the RIT encoding for Telugu.  If you happen to
have any of those, it would be greatly appreciated!

    Bruce> which is distributed with dvedit, there is an enormous overlap in
    Bruce> the glyphs, but no relationship I can see in the encodings.  Yet as
    Bruce> I recall, the Jagran encoding is being used in some indic on-line
    Bruce> newspapers at the moment.

I am not familiar with dvedit.  Any pointers?

You have hit the nub of the problem.  With Indic fonts, just about every font
has a different glyph set.  The problem is less prevalent for other scripts,
but exists.  I have been working on a small system for writing simple
rendering rules for these cases.

  http://crl.nmsu.edu/~mleisher/contextnew.pdf

A much more sophisticated mapping/rendering system is available in the OTP
module of the freely available Omega/Lambda (TeX/LaTeX) typesetting system.

  http://www.ens.fr/omega

  [related page]
  http://www.fluxus-virus.com
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            I have never made but one prayer to God,
New Mexico State University       a very short one:
Box 30001, Dept. 3CRL                 "Oh Lord, make my enemies ridiculous."
Las Cruces, NM  88003             And God granted it.  -- Voltaire, letter

<Prev in Thread] Current Thread [Next in Thread>