perl-unicode

Re: Is perl unicode or not?

2002-10-13 09:30:06
Nadim <nadim(_at_)khemir(_dot_)net> writes:
On Sunday 13 October 2002 14:45, Nick Ing-Simmons wrote:
I am using 5.6.3 on windows from activestate. I do the
following.

I don't think you are. As far as I am aware there is only perl5.6.1
there isn't a .3 subversion yet.

Sorry for the typo 5.6.1.63

my $ole_object = ..... ;
my $unicode_string = $ole_object->GetUnicodeString() ;

OLE objects are a Win32 thing. You would be better off asking on
one of the Win32 aware ActiveState lists. We would at least need
to know how you created $ole_object so we can lookup the code
that gets the string.
I wrote the OLE object, The string it seends back is a unicode string. I 
call other functin on the object and they behave right.

Ok so as it is your code you can make it do the right thing - which may 
not be what you think at first.

Can you share with us the C code frament that returns the string to 
perl as a "scalar value" (SV). If you are not doing that (but leaving it to 
Win32::OLE) then can you give the "signature" of the ->GetUnicodeString 
method that it is wrapping?

I string in perl has a "PV" pointer value which is a sequence of 
bytes (octets). In perl5.6 and later perl can be told to interpret
them in one of two ways:
  1. Like all previous perls as 1-byte/char with same repertoire 
     as iso-8859-1.
  2. As UTF-8 representing Unicode (some mainframes use UTF-EBCDIC 
     but that is not an issue here).

So to return "Unicode" to perl you must use form (2). That is the 
Unicode codepoints must be UTF-8 encoded, and you must call SvUTF8_on(sv)
on the sv.

This is different from Win32's normal treatment of Unicode - which is 
to use 16-bit "wide characters" from the UCS-2 repertoire of Unicode
(I have been told that Win2k and later use UTF-16 to give the full
Unicode repertoire at the expense of using surrogate pairs).


->GetUnicodeString has converted things it does not understand to '?'.
GetUnicodeString doesn't convert anything, did you mean perl converted 
things it didn't understand?

print $unicode_string ;
# prints ??????????????? on the console

Hmm - as perl5.6 does not have "smart" Unicode IO (perl5.8 does),
this suggests that string is actually '?' x 17 - i.e. you got "junk"
back from the OLE call.
Don't think so, THe ole object behaves correctly (I test it froma C++ app) 
now Win32::Ole is also involved.

It is the Win32::Ole that _may_ be doing (or not doing) the conversion.
Which version are you using? 

2/ read a unicode string from a file
   For perl5.6 file has to be in UTF-8 and you need to do some hackery
   (which was so horrible I can't recall it).
Did you see the hakery in this mailing list?

Possibly - a _long_ time ago when perl5.6 was being developed, more likely 
on the perl5-porters(_at_)perl(_dot_)org list - none the less there are perl5.6 
users on this list that no doubt still use it.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

<Prev in Thread] Current Thread [Next in Thread>