perl-unicode

RE: UTF-8 in web pages

2001-08-06 12:49:41
Hello-

Two comments/questions:

1)

UNICODE is a character encoding ...

Wrong.  Unicode is not a character encoding.  There are many different
character encodings which are used to encode unicode, notably utf-8,
ucs16 and perl's own utf-8 like encoding.
I was not aware that perl's encoding is "UTF8-like".  How does it differ
from regular UTF8?  Do any potential problems result?


2)

A big problem in Win32 browsers for multi-byte encodings, including UTF8,
(MS and new netscape, although not old netscape) is that
you cannot copy characters out of the browser.  (MS seems to copy
into Word, but all my other programs receive question marks).

Not really. All M$-Win32 browsers support UTF-8 flawlessly.

I'm glad UTF-8 display is working well for you. Unfortunately, we have run
into a few UTF-8 bugs in Netscape Navigator which were a problem for our
application. I hope it's not too much off-topic to list the two
big ones, in
case other readers run into them also:

- There are some non-negligible setup issues (nicely detailed in
one of Alan
Wood's Unicode Resource Pages,
http://www.hclrss.demon.co.uk/unicode/netscape.html). Unfortunately, you
have to set up a *single* "unicode" font for dealing with all characters,
whereas most machines just have fonts for specific scripts lying around
(i.e., a font for Japanese, a font for Korean, etc.) If you select a
Japanese font for Unicode display it will work fine for Japanese and other
characters in the font, but won't be able to deal with Korean or other
characters which aren't included. Netscape misbehaved and crashed
a lot when
I tried to get it to use the huge Microsoft Unicode font with all of the
characters.

- Text input boxes in forms only accept characters in the "ANSI"
system code
page. You can't enter or edit Japanese data, even if you can display it.

IE5 does not have these problems and, of course, it's much more stable. If
you (and the browsers viewing your pages) can stick to IE5, UTF-8
shouldn't
be a problem.

Mark Lewellen

<Prev in Thread] Current Thread [Next in Thread>