perl-unicode

RE: UTF-16 -> UTF-8

2001-11-21 11:21:45
There is no such thing as a UTF-8 encoded font. A font with a Unicode cmap
can be used by any software to render Unicode data from any source in any
format. It is up to the application to request rendering of characters in a
datatype supported by the platform, whether or not that is the application's
internal character format, or the format of input data.

Your problem with '\' is that MS Mincho is trying to be both a Unicode font
and a Japanese font. This has resulted in an incorrect code point
assignment, evidently in a convoluted and (IMNSHO) misguided attempt at
backward compatibility. The font should have the correct code point to glyph
mapping, and the application should know what character set it is asking the
system to render, but this is not always the case today.

Although MS Mincho is supposed to be a Unicode font, it improperly assigns
the yen sign to code point U+005C Reverse Solidus. The same problem exists
in MS Gothic and MS PGothic. Microsoft's Korean-style fonts including
Batang, BatangChe, Dotum, DotumChe, Gulim, GulimChe, Gungsuh, and GungsuhChe
have the won currency symbol at this code point.

I don't understand the problem with ')', which is the same in ASCII and MS
Mincho.

Microsoft fonts that cover the CJK Unified block and also have correct
glyphs for '\' include

Arial Unicode MS
MingLiU
PMingLiU
SimHei
SimSun

-----Original Message-----
From: Tim Scott [mailto:gulbrain(_at_)yahoo(_dot_)co(_dot_)uk]
Sent: Wednesday, November 21, 2001 8:42 AM
To: Martin Duerst
Cc: perl-unicode(_at_)perl(_dot_)org
Subject: Re: UTF-16 -> UTF-8


Martin,
Thanks - MS Mincho looks interesting.
What I found, though, was that some of the punctuation doesn't appear as
expected. For example a ')' appears as a centralised dot and a '\' appears
as a Yen symbol. Not terribly good for writing PERL !
Also - the glyphs looked slightly different : do you know if it's a big- or
little-endian UTF-16 font or a UTF-8 font ?
Ideally I'd like to use a UTF-8 font.
Apparently there's an MS Gothic ...
Thanks,
Tim
  Martin Duerst <duerst(_at_)w3(_dot_)org> wrote:
At 17:25 01/11/20 +0100, Philip Newton wrote:
PS: Does anyone know of - even an odd looking

It would look really, really, odd.


- Fixed pitch Unicode font
including Western European, CJK, Cyrillic and Greek glyphs (ie: most
Left
to Right data) ? It's not for an end-user, it's for techies like myself,
so it doesn't need to be brilliant, just more distinctive than a set of
squares or blocks !

I think MS Mincho (that came with Japanese language pack for MSIE 3.0, I
think) is fixed-width and has Western, Cyrillic, and Greek glyphs --
and, of course, a large assortment of CJK. But I've only used it for CJK
so I can't say for sure.

Yes, some font covering Japanese would be a good start. You would have
to copy the glyphs used for fullwith ASCII to also be used for plain
ASCI I, and then add whatev Get personalised at My Yahoo!.

<Prev in Thread] Current Thread [Next in Thread>