perl-unicode

RE: Unicode filenames on Windows with Perl >= 5.8.2

2004-06-21 09:30:05
On Mon, 21 Jun 2004, Steve Hay wrote:
Jan Dubois wrote:
You need to call CreateFileW() to open a file with a Unicode name. If
you want to hack something, then I would suggest to write a little XS
module that just swaps out the file handle in a PerlIO* structure.
Look at PerlIOWin32_open() in win32/win32io.c to see how Perl
currently opens a file.

I really need all of Perl's filename handling to be Unicode-savvy, not
just open(). Or have I mis-understood you?

No, you are correct.  I assumed you just wanted to solve a specific problem,
reading a bunch of file with Unicode names.

Another quick-and-dirty "solution" would be to build a custom Perl by
hacking win32/win32.h. If you change the USING_WIDE definition to "1"
then you end up with a version of Perl that has the old "-C" behavior
hardcoded. Remember that this is not really compatible with Perl's
Unicode handling.

Reading a previous e-mail from you on this subject
(http://www.mail-archive.com/perl-unicode(_at_)perl(_dot_)org/msg02127.html), 
it
seems that there are at least four issues with the old "-C" behaviour:

1. It didn't do anything with the UTF8 flag in SV's;
2. There are no wide API functions on Win95/98/ME;
3. Some core Perl API's take char *'s, not SV *'s;
4. Non-core modules would be affected too.

I would guess that 1 is maybe not too much work? (Just a wild guess -
I don't really know.)

Probably, but it relies on 3 being implemented first. The char* doesn't
carry the UTF8 flag.

I must confess that 2 doesn't really bother me since the "9x" type
systems are now a thing of the past (XP onwards are all "NT" type
systems, even XP Home Edition).

While I also wish that Win 9x would just cease to exist, I don't think
any core Perl patches would be accepted if they would render Perl
inoperable on those systems. You would have to provide at least a
fallback solution, even if it means creating separate binaries for "9x"
and "NT" Windows systems.

How much work is invovled in 3?

Perl internals pretend to use C runtime routines like open(), fstat()
etc. but reimplements them on some systems to get a consistent behavior.
You will need to define a different API that uses SV* instead of char*
for all file/directory name arguments and use that one exclusively. Of
course they all need to be indirected through the PERL_IMPLICIT_SYS
system so that they can be redefined for individual operating systems.

I'm not sure how much work the implementation is, but I'm afraid you
would also need to spend significant time arguing about it.

Regarding 4, is it only Win32 modules that would be affected (where
"A" functions would need replacing with "W" functions), or would
others be affected too?

Others would be too, because they use the redefined open() function from
Perl if they are opening a file. Of course they would need to be changed
anyways to make them support Unicode filenames, so maybe this isn't an
issue. You would still need to provide an "ASCII" interface for the new
Perl Unicode API so that you could funnel the modules open() call
through it.

Given that 3 at least would probably break binary compatibility, I
guess this sort of thing won't be done any sooner than 5.10 at the
earliest, but having something done in time for that would be great.
Is that a realistic possibility, or just wishful thinking?

I think it is possible, but it requires someone to both do the work and
to argue for it on P5P. Without this "champion", I don't see it
happening at all.

Cheers,
-Jan