perl-unicode

Re: Unicode filenames on Windows with Perl >= 5.8.2

2004-06-21 02:30:06
Jan Dubois wrote:

I'm trying to figure out if I can handle Unicode filenames on 
Windows using Perl 5.8.4, and if so, how.
   


[...]

 

So my question is: How can I deal with these files?

I've tried using Perl scalars containing UTF-8, UTF-16LE and 
UTF-16BE encodings of the filenames, but none of them work 
either.  Indeed, if I try to write a new file with a name 
constructed in those ways, then the name of the file actually 
created is simply the sequence of bytes that make up those encodings.
   


I don't think this is possible from Perl code right now.  

I feared as much :(

Are there any plans to make this possible in Perl?  Most of my 
colleagues here work in C/C++ and have recently completed a large 
project to greatly improve i18n in our C/C++ software, including Unicode 
filenames.  Not being able to do likewise with our Perl code leaves me 
looking rather stupid.  It's really not very good for Perl advocacy, to 
say the least.

You need to
call CreateFileW() to open a file with a Unicode name.  If you want to
hack something, then I would suggest to write a little XS module that
just swaps out the file handle in a PerlIO* structure.  Look at
PerlIOWin32_open() in win32/win32io.c to see how Perl currently opens
a file.

I really need all of Perl's filename handling to be Unicode-savvy, not 
just open().  Or have I mis-understood you?


Another quick-and-dirty "solution" would be to build a custom Perl
by hacking win32/win32.h.  If you change the USING_WIDE definition
to "1" then you end up with a version of Perl that has the old "-C"
behavior hardcoded.  Remember that this is not really compatible with
Perl's Unicode handling.

Reading a previous e-mail from you on this subject 
(http://www.mail-archive.com/perl-unicode(_at_)perl(_dot_)org/msg02127.html), 
it 
seems that there are at least four issues with the old "-C" behaviour:

1. It didn't do anything with the UTF8 flag in SV's;
2. There are no wide API functions on Win95/98/ME;
3. Some core Perl API's take char *'s, not SV *'s;
4. Non-core modules would be affected too.

I would guess that 1 is maybe not too much work?  (Just a wild guess - I 
don't really know.)

I must confess that 2 doesn't really bother me since the "9x" type 
systems are now a thing of the past (XP onwards are all "NT" type 
systems, even XP Home Edition).

How much work is invovled in 3?

Regarding 4, is it only Win32 modules that would be affected (where "A" 
functions would need replacing with "W" functions), or would others be 
affected too?

Given that 3 at least would probably break binary compatibility, I guess 
this sort of thing won't be done any sooner than 5.10 at the earliest, 
but having something done in time for that would be great.  Is that a 
realistic possibility, or just wishful thinking?

- Steve



------------------------------------------------
Radan Computational Ltd.

The information contained in this message and any files transmitted with it are 
confidential and intended for the addressee(s) only.  If you have received this 
message in error or there are any problems, please notify the sender 
immediately.  The unauthorized use, disclosure, copying or alteration of this 
message is strictly forbidden.  Note that any views or opinions presented in 
this email are solely those of the author and do not necessarily represent 
those of Radan Computational Ltd.  The recipient(s) of this message should 
check it and any attached files for viruses: Radan Computational will accept no 
liability for any damage caused by any virus transmitted by this email.