perl-unicode

Re: Inverse of /\p{script}/

2003-08-29 16:47:10
On Fri, 29 Aug 2003 11:08:33 +0100, Nick Ing-Simmons 
<nick(_dot_)ing-simmons(_at_)elixent(_dot_)com> said:

  > But cyrillic glyphs are likely double width :-(
  > This is one of reasons I want to do _something_ in this area.
  > I don't want to even try and read a big 16-bit Japanese font 
  > just to get cyrillic (for SPAMer's name) or greek Sigma (for math).

  > The other thing that needs fixing is that Tk currently ignores 
  > any locale information that might be available. So for "unified" ideographs
  > it will use a font that has the character regardless of which "style" it is
  > in. So for Japanese it is quite likely to find a simplified Chinese style
  > font and use that for Han, then when it hits Katakana it will find 
  > an 8-bit (JIS201?) font and use that for those, then when it finds 
  > a Hiragana it will find a JIS 208 font. The result looks a mess even
  > to my occidental eyes.

  > What I am hoping to do for Tk804 is put some kind of callback to perl
  > hook in so that when Tk wants a font for a particular character it 
  > can call to perl and perl will give it strong push in a particular 
direction.
  > Thus for someone expecting Japanese if asked for a Han character 
  > it will suggest a JIS font. While for someone expecting Chinese it 
  > will suggest a Big5 or gb2312 font as appropriate.

  > What gets really painful is the Unicode fonts - one has to look at 
  > which characters it has to decide if it 
  > Japanese/Simplified Chinese/Traditional Chinese/Korean or just a grab-bag 
  > of glyphs font designer had to hand. 

FWIW, I just found an old posting from the Mozilla developer Katsuhiko
Momoi. He explained:

     ... not every one would tag their Unicode documents with a lang
     tag indicating what language that is. And Mozilla has dependency
     on language for which font glyphs to use. For example, Unicode
     CJK ideographs are not necessarily rendered the same from
     language to language. The same code point may lead to different
     font glyphs dependent on what language it is. Unless every one
     uses a lang tag, I may end up seeing a Japanese document with
     some Chinese glyphs. And I definitely don't want that! (See how
     fonts are set in the preference dialog -- according to language.
     But if language info is not available in the docs, we do our best
     by looking at the charset info -- a charset is a good secondary
     determining factor for some language, e.g. Chinese, Japanese,
     Korean, etc.. Thus, the notion of primary charset is still useful
     in this situation. )

(Cited from http://bugzilla.mozilla.org/show_bug.cgi?id=13393)

Posting this just a pointer to another project that may have developed
helpful code in that area...

-- 
andreas

<Prev in Thread] Current Thread [Next in Thread>