perl-unicode

Re: Info required - "Wide API calls" in Win32 Perl >= 5.8.2

2004-02-19 16:30:08
First, thanks to all on the fast response to my questions.

Jan Dubois wrote:

On Thu, 19 Feb 2004 22:03:14 +0200, Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> 
wrote:


After switching from perl 5.8.0 build 806 to perl 5.8.2 build 808
I found that the ability to invoke Win32 wide api calls was silently
removed (-C command line switch or ${^WIDE_SYSTEM_CALLS}=1).

Well, not *silently* removed.  It is mentioned in the change notes
(perldelta) of the Perl 5.8.1 standard distribution, I don't know
where exactly those notes are in the ActivePerls.


actually, when i stated "silently" i didn't think about documentation matters - even if it was a little bit hard to find - at least in ActivePerl's documentation - it was in "perlrun" doc about commandline switches, but I was not able - even afterwards - to find any discusions or infos on plans and reasons to do that in advance. Since I tested the wide api calls with 5.8.0, it to a lot of testing again with > 5.8.0 to realize that the Wide API interface was removed.


It is in perl581delta.  It looks like it is still missing in the table of
content list, even though I thought I fixed that in build 809. :(


The removal was done as the suggestion of ActivePerl maintainers
since they considered the "Wide API" implementation (which they
themselves originally did) broken, and the -C was "recycled"
for other Unicodeish purposes.  I am not familiar with the exact details
of what was broken with the -C as it was.


The -C option was implemented before Perl had proper Unicode support.  The
implementation of -C (the code is still there, it is just disabled) does
*not* look at the UTF8 flag at all.  It just assumes that the string passed
in is always in UTF8 unless "use bytes" was in effect.  It also stores
strings as UTF8 without setting the correct SV bits.  Therefore it is not
compatible with the Unicode support in Perl.

To me there is no direct relation between the utf8 flag and the usage of wide api calls -> see long file/dir names issue. The only thing that changes is the type of encoding/decoding that is required before passing a value to the wide api call - but this should be also true if no wide api's are used.

The main point why I was so astonished about the remove/disabling of the wide api calls was the fact that I assumed that perl would silently move to a general use of the wide api interface (fixing existing buffer length issues) on all perl internal functions (environment, ...), at least if the utf8 flag was discovered - not to dump them (at least for the moment) at all.



This change however removes not only the possibility to use "UNICODE"
names but also access to files and folders with names longer than 255 bytes.


Support of "long filenames" through the wide API was coincidental and not
consistent.  There are many places in win32/win32.c where buffers are
allocated as MAX_PATH or MAX_PATH+1 characters.  If your filename passes
through any of those routines, it would still be truncated even with -C.


Unfortunatly this problem is not to perl alone. Even most Microsoft OS (up to Windows Server 2003) software do have similar ploblems - we run in that kind of problem by using roaming user profiles on Win2K, WinXP workstations.

However this problem is far easier to solve for the file/dir interface than for example for registry entry- and key-names since there is a clear rule: (PathElement/PathLength) 255/255 Chars on ANSI, 255/32,767 WChars on UNICODE/Wide API calls.


Maybe somebody knows:
        a.) about a way to invoke the wide API interface


It is not possible because the USING_WIDE macro is hardcoded as 0 right now
(in win32.h).

Thanks to that info - after examine the 5.8.2 source I found this already but I didn't dig in that far to realize if this macro alone was controlling the "Wide API" call interface.



        b.) plans about the future of  the usage of wide API calls in perl
            for windows (32/64) >  5.8.2.

Vague plans, yes, but nobody has as of yet volunteered to implement anything.


I have this on my list of "things I would like to do", but if the last few
months are any indication, I may never get around to it.


You may peruse the archives of the perl-unicode(_at_)perl(_dot_)org list, the problems of supporting Unicode filenames (in general, not just WinXX) were discussed a few weeks back. It's not quite as simple as reverting back to using the W-APIs,
I'm afraid.



yes I read threads about that - however I didn't found any conclusions or future plans there.

Also to me most discusions about Win32/64 unicode issues on filesystems give the impression of a direct relation between wide api usage and the support of unicode at the filesystem level. However if you to a look at the "Native API" level of Windows NT (3.1 up to 5.1) all api calls are anyway using UNICODE_STRING for name values.

and
        c.) who is the maintainer of the win32 perl (ActivePerl) port
            of the I/O subsystem.

That would be ActiveState.  Sorry to be so flippant but that's where the
largest pool of Win32 Perl knowledge is. If they cannot find the resources
to reintroduce a fixed version of the W-APIs, someone else knowledgeable
in WinXX Unicode support must do it.

I sent this query before to the ActiveState hosted perl-win32-porters mailing list (since this is issue is not direcly a unicode matter to me) - but didnt't receive any answers there.

To me personally it's more the required perl internal knowledge than the once about the WinAPI.

Actually I created serveral modules on Win32 API matters like access to shared memmory by memmory mapped files, unicode based environent vars with read/write in utf-8 encoding, registry and ini-files in utf-8 encoding myself. Unfortunatly these modules are closley related to our client management framwework so it makes no sense to put them on CPAN. However I avoided XS and used mainly the WinAPI module as an interface.



I once looked into this before.  The problem is not limited to core Perl,
but would also affect e.g. the libwin32 modules.

But even just for core Perl, things become more complicated as long as you
want to support Windows 95/98/Me.  Those platforms (I'm using the term
loosely) do not support the wide APIs.

this is correct - still, to stay compatible with older platforms required functionality (at least if you use perl for system management tasks) on new once is sacrified.

So you can't always use the wide
APIs whenever an SV has the UTF8 bit sit; you'll actually have to try to
downgrade to Latin1 in that case.  Microsoft has a Unicode emulation layer
for Win9X, but as far as I can tell, you cannot use it transparently on
Win9x only, you'll have to create separate binaries.  Maybe that is a
reasonable price to pay though.

Finally, there are various APIs in Perl that just take a char*, not an SV*.
You'll have to modify these APIs to indicate the encoding as well.  I doubt
you can do this in a manner that maintains binary compatibility.

BTW, long filename support is a separate issue.  It would be nice to
transparently support the //?/ prefix for really long filenames.  But for
this to happen, *all* buffers holding filenames will have to be dynamically
allocated/resized.

generaly it's always a good idea to add leading "\\?\" or "\\UNC\?\" (if not allready there) to any path used in wide api calls.


I'm afraid this is not just a weekend project.


I know - but at least to me it is necessary to get any conclusive info about if and when a "wide api" interface at least for the I/O layer of perl is reimplemented or updated. If there is no info - all our existing code has to be ported to an other language platform - most probably to C++ - and that's not an easy and quick task.

Cheers,
-Jan

PS: I'm not following this list closely, so I may not be aware of what has
been discussed here a few weeks ago. :)

regards,

Peter