perl-unicode

Re: perlunicode comment - when Unicode does not happen

2003-12-23 07:30:04
On Tue, 23 Dec 2003, Nick Ing-Simmons wrote:

Jungshik Shin <jshin(_at_)mailaps(_dot_)org> writes:
On Mon, 22 Dec 2003, Jarkko Hietaniemi wrote:

(AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for
filenames,
but because of backward compatibility reasons using 8-bit codepages is
much
more likely.

 No. _Both_ NTFS (only supported by Win 2k/XP) and VFAT (supported by
Win 2k/XP and Win 9x/ME) use UTF-16LE **exclusively**.

But those OSes also support older file systems (e.g. floppies),
and shares where things are not as clear (at least to me).

  In cases of floppy (FAT), I guess we're just back to old days :-)
In case of CIFS, I really have to check. Then, even Windows supports
(although not free) NFS and other file sharing ... things become fuzzy.....


In that respect,
Windows filesystems are 'saner' than Unix file systems.  APIs for accessing
them come in two flavors, 'A' APIs and 'W' APIs, though as I explained
in another message of mine.

In that message you mentioned a .dll - should perl look for and
link to that DLL ?

 Actually, I mentioned three different possibilities. Only one of them
relies on MSLU (Microsoft Layer for Unicode). If you do that, you just
need a single binary that works across Win32 platforms. However,
the presence of MSLU is required.

 The second strategy is to do what Mozilla does: 1. write a set of
wrapper functions that emulates Windows
'W' APIs, 2. detect the OS at run-time (Windows 9x/ME vs Windows 2k/XP) 3.
call either emulated versions of 'W' APIs or native 'W' APIs (I'm omitting
details here, but you should get the idea). This is actually similar
to what's done by MSLU, but you don't have to rely on MSLU.

 The final approach is to build two separate binaries, one for Win 9x/ME
(with 'A' APIs) and the other for Win 2k/XP (with 'W' APIs)

  In all three cases, the character repertoire (that can be used for
file names) on Win 9x/ME is limited to that of the system codepage. It
may sound odd because VFAT can cover the whold Unicode repertoire. Don't
ask me why, but that's the way Win 9x/ME works. That can explain why
Jarkko got confused.  If somebody hacks VFAT and write her own VFAT IO
functions, the full range of Unicode can be used even on Win 9x/ME.

  Jungshik