perl-unicode

Re: Info required - "Wide API calls" in Win32 Perl >= 5.8.2

2004-02-19 14:30:04
On Thu, 19 Feb 2004 22:03:14 +0200, Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> 
wrote:

After switching from perl 5.8.0 build 806 to perl 5.8.2 build 808
I found that the ability to invoke Win32 wide api calls was silently
removed (-C command line switch or ${^WIDE_SYSTEM_CALLS}=1).

Well, not *silently* removed.  It is mentioned in the change notes
(perldelta) of the Perl 5.8.1 standard distribution, I don't know
where exactly those notes are in the ActivePerls.

It is in perl581delta.  It looks like it is still missing in the table of
content list, even though I thought I fixed that in build 809. :(

The removal was done as the suggestion of ActivePerl maintainers
since they considered the "Wide API" implementation (which they
themselves originally did) broken, and the -C was "recycled"
for other Unicodeish purposes.  I am not familiar with the exact details
of what was broken with the -C as it was.

The -C option was implemented before Perl had proper Unicode support.  The
implementation of -C (the code is still there, it is just disabled) does
*not* look at the UTF8 flag at all.  It just assumes that the string passed
in is always in UTF8 unless "use bytes" was in effect.  It also stores
strings as UTF8 without setting the correct SV bits.  Therefore it is not
compatible with the Unicode support in Perl.

This change however removes not only the possibility to use "UNICODE"
names but also access to files and folders with names longer than 255 
bytes.

Support of "long filenames" through the wide API was coincidental and not
consistent.  There are many places in win32/win32.c where buffers are
allocated as MAX_PATH or MAX_PATH+1 characters.  If your filename passes
through any of those routines, it would still be truncated even with -C.

Maybe somebody knows:
     a.) about a way to invoke the wide API interface

It is not possible because the USING_WIDE macro is hardcoded as 0 right now
(in win32.h).

     b.) plans about the future of  the usage of wide API calls in perl
         for windows (32/64) >  5.8.2.

Vague plans, yes, but nobody has as of yet volunteered to implement 
anything.

I have this on my list of "things I would like to do", but if the last few
months are any indication, I may never get around to it.

You may peruse the archives of the perl-unicode(_at_)perl(_dot_)org list, the 
problems
of supporting Unicode filenames (in general, not just WinXX) were 
discussed
a few weeks back. It's not quite as simple as reverting back to using 
the W-APIs,
I'm afraid.

and
     c.) who is the maintainer of the win32 perl (ActivePerl) port
         of the I/O subsystem.

That would be ActiveState.  Sorry to be so flippant but that's where the
largest pool of Win32 Perl knowledge is.  If they cannot find the 
resources
to reintroduce a fixed version of the W-APIs, someone else knowledgeable
in WinXX Unicode support must do it.

I once looked into this before.  The problem is not limited to core Perl,
but would also affect e.g. the libwin32 modules.

But even just for core Perl, things become more complicated as long as you
want to support Windows 95/98/Me.  Those platforms (I'm using the term
loosely) do not support the wide APIs.  So you can't always use the wide
APIs whenever an SV has the UTF8 bit sit; you'll actually have to try to
downgrade to Latin1 in that case.  Microsoft has a Unicode emulation layer
for Win9X, but as far as I can tell, you cannot use it transparently on
Win9x only, you'll have to create separate binaries.  Maybe that is a
reasonable price to pay though.

Finally, there are various APIs in Perl that just take a char*, not an SV*.
You'll have to modify these APIs to indicate the encoding as well.  I doubt
you can do this in a manner that maintains binary compatibility.

BTW, long filename support is a separate issue.  It would be nice to
transparently support the //?/ prefix for really long filenames.  But for
this to happen, *all* buffers holding filenames will have to be dynamically
allocated/resized.

I'm afraid this is not just a weekend project.

Cheers,
-Jan

PS: I'm not following this list closely, so I may not be aware of what has
been discussed here a few weeks ago. :)