First, thanks to all on the fast response to my questions.
Jan Dubois wrote:
On Thu, 19 Feb 2004 22:03:14 +0200, Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi>
wrote:
After switching from perl 5.8.0 build 806 to perl 5.8.2 build 808
I found that the ability to invoke Win32 wide api calls was silently
removed (-C command line switch or ${^WIDE_SYSTEM_CALLS}=1).
Well, not *silently* removed. It is mentioned in the change notes
(perldelta) of the Perl 5.8.1 standard distribution, I don't know
where exactly those notes are in the ActivePerls.
actually, when i stated "silently" i didn't think about documentation
matters - even if it was a little bit hard to find - at least in
ActivePerl's documentation - it was in "perlrun" doc about commandline
switches, but I was not able - even afterwards - to find any discusions
or infos on plans and reasons to do that in advance.
Since I tested the wide api calls with 5.8.0, it to a lot of testing
again with > 5.8.0 to realize that the Wide API interface was removed.
It is in perl581delta. It looks like it is still missing in the table of
content list, even though I thought I fixed that in build 809. :(
The removal was done as the suggestion of ActivePerl maintainers
since they considered the "Wide API" implementation (which they
themselves originally did) broken, and the -C was "recycled"
for other Unicodeish purposes. I am not familiar with the exact details
of what was broken with the -C as it was.
The -C option was implemented before Perl had proper Unicode support. The
implementation of -C (the code is still there, it is just disabled) does
*not* look at the UTF8 flag at all. It just assumes that the string passed
in is always in UTF8 unless "use bytes" was in effect. It also stores
strings as UTF8 without setting the correct SV bits. Therefore it is not
compatible with the Unicode support in Perl.
To me there is no direct relation between the utf8 flag and the usage of
wide api calls -> see long file/dir names issue. The only thing that
changes is the type of encoding/decoding that is required before passing
a value to the wide api call - but this should be also true if no wide
api's are used.
The main point why I was so astonished about the remove/disabling of the
wide api calls was the fact that I assumed that perl would silently move
to a general use of the wide api interface (fixing existing buffer
length issues) on all perl internal functions (environment, ...), at
least if the utf8 flag was discovered - not to dump them (at least for
the moment) at all.
This change however removes not only the possibility to use "UNICODE"
names but also access to files and folders with names longer than 255
bytes.
Support of "long filenames" through the wide API was coincidental and not
consistent. There are many places in win32/win32.c where buffers are
allocated as MAX_PATH or MAX_PATH+1 characters. If your filename passes
through any of those routines, it would still be truncated even with -C.
Unfortunatly this problem is not to perl alone. Even most Microsoft OS
(up to Windows Server 2003) software do have similar ploblems - we run
in that kind of problem by using roaming user profiles on Win2K, WinXP
workstations.
However this problem is far easier to solve for the file/dir interface
than for example for registry entry- and key-names since there is a
clear rule: (PathElement/PathLength) 255/255 Chars on ANSI, 255/32,767
WChars on UNICODE/Wide API calls.
Maybe somebody knows:
a.) about a way to invoke the wide API interface
It is not possible because the USING_WIDE macro is hardcoded as 0 right now
(in win32.h).
Thanks to that info - after examine the 5.8.2 source I found this
already but I didn't dig in that far to realize if this macro alone was
controlling the "Wide API" call interface.
b.) plans about the future of the usage of wide API calls in perl
for windows (32/64) > 5.8.2.
Vague plans, yes, but nobody has as of yet volunteered to implement
anything.
I have this on my list of "things I would like to do", but if the last few
months are any indication, I may never get around to it.
You may peruse the archives of the perl-unicode(_at_)perl(_dot_)org list, the
problems
of supporting Unicode filenames (in general, not just WinXX) were
discussed
a few weeks back. It's not quite as simple as reverting back to using
the W-APIs,
I'm afraid.
yes I read threads about that - however I didn't found any conclusions
or future plans there.
Also to me most discusions about Win32/64 unicode issues on filesystems
give the impression of a direct relation between wide api usage and the
support of unicode at the filesystem level. However if you to a look at
the "Native API" level of Windows NT (3.1 up to 5.1) all api calls are
anyway using UNICODE_STRING for name values.
and
c.) who is the maintainer of the win32 perl (ActivePerl) port
of the I/O subsystem.
That would be ActiveState. Sorry to be so flippant but that's where the
largest pool of Win32 Perl knowledge is. If they cannot find the
resources
to reintroduce a fixed version of the W-APIs, someone else knowledgeable
in WinXX Unicode support must do it.
I sent this query before to the ActiveState hosted perl-win32-porters
mailing list (since this is issue is not direcly a unicode matter to me)
- but didnt't receive any answers there.
To me personally it's more the required perl internal knowledge than the
once about the WinAPI.
Actually I created serveral modules on Win32 API matters like access to
shared memmory by memmory mapped files, unicode based environent vars
with read/write in utf-8 encoding, registry and ini-files in utf-8
encoding myself. Unfortunatly these modules are closley related to our
client management framwework so it makes no sense to put them on CPAN.
However I avoided XS and used mainly the WinAPI module as an interface.
I once looked into this before. The problem is not limited to core Perl,
but would also affect e.g. the libwin32 modules.
But even just for core Perl, things become more complicated as long as you
want to support Windows 95/98/Me. Those platforms (I'm using the term
loosely) do not support the wide APIs.
this is correct - still, to stay compatible with older platforms
required functionality (at least if you use perl for system management
tasks) on new once is sacrified.
So you can't always use the wide
APIs whenever an SV has the UTF8 bit sit; you'll actually have to try to
downgrade to Latin1 in that case. Microsoft has a Unicode emulation layer
for Win9X, but as far as I can tell, you cannot use it transparently on
Win9x only, you'll have to create separate binaries. Maybe that is a
reasonable price to pay though.
Finally, there are various APIs in Perl that just take a char*, not an SV*.
You'll have to modify these APIs to indicate the encoding as well. I doubt
you can do this in a manner that maintains binary compatibility.
BTW, long filename support is a separate issue. It would be nice to
transparently support the //?/ prefix for really long filenames. But for
this to happen, *all* buffers holding filenames will have to be dynamically
allocated/resized.
generaly it's always a good idea to add leading "\\?\" or "\\UNC\?\" (if
not allready there) to any path used in wide api calls.
I'm afraid this is not just a weekend project.
I know - but at least to me it is necessary to get any conclusive info
about if and when a "wide api" interface at least for the I/O layer of
perl is reimplemented or updated. If there is no info - all our existing
code has to be ported to an other language platform - most probably to
C++ - and that's not an easy and quick task.
Cheers,
-Jan
PS: I'm not following this list closely, so I may not be aware of what has
been discussed here a few weeks ago. :)
regards,
Peter