perl-unicode

Re: Is perl unicode or not?

2002-10-13 08:30:04
On Sunday 13 October 2002 14:45, Nick Ing-Simmons wrote:
I am using 5.6.3 on windows from activestate. I do the
following.

I don't think you are. As far as I am aware there is only perl5.6.1
there isn't a .3 subversion yet.

Sorry for the typo 5.6.1.63

my $ole_object = ..... ;
my $unicode_string = $ole_object->GetUnicodeString() ;

OLE objects are a Win32 thing. You would be better off asking on
one of the Win32 aware ActiveState lists. We would at least need
to know how you created $ole_object so we can lookup the code
that gets the string.
I wrote the OLE object, The string it seends back is a unicode string. I 
call other functin on the object and they behave right.

print length($unicode_string), "\n" ;
# prints 17, which is the length of the unicode string

Cool - but are you sure you got the real string?
yes and no, that's the whole question. I know I send a unicode string from 
the object. if perl doesn't mungle my string into something else, then it 
is a unicode string. (I have the ole object write the string into a file 
at the same time and it is as japanese as can be)

use byte () ;
print byte::length($unicode_string), "\n" ;
# prints 17, wow, the string is japanese I expect 34

The byte:: hackery is _very_ confusing to all concerned.
It returns the length the string happens to be in perl's internal
encoding. That may be either iso-8859-1 or UTF-8. If the original
"japanese" happened to be all iso-8859-1 even though it used to be
2-bytes/char it will be held (normally) by perl as 1-byte per-char.
You will also get 1-byte/char if (as I suspect is happening here)
OK.

->GetUnicodeString has converted things it does not understand to '?'.
GetUnicodeString doesn't convert anything, did you mean perl converted 
things it didn't understand?

print $unicode_string ;
# prints ??????????????? on the console

Hmm - as perl5.6 does not have "smart" Unicode IO (perl5.8 does),
this suggests that string is actually '?' x 17 - i.e. you got "junk"
back from the OLE call.
Don't think so, THe ole object behaves correctly (I test it froma C++ app) 
now Win32::Ole is also involved.

2/ read a unicode string from a file
   For perl5.6 file has to be in UTF-8 and you need to do some hackery
   (which was so horrible I can't recall it).
   For perl5.8 this is easy - it was a major goal of perl5.8.
Did you see the hakery in this mailing list?

how can I flatten perl-unicode strings to binary? 
This would tell us what and how perl has store the input. I'll try a 
Devel::Peek.I installed 5.8 on my linux box and I'll do some tests. Still 
I have to run the scrip on a win32 box with active state even if I have to 
jump through hoops.

Thanks for your answers
Nadim.


<Prev in Thread] Current Thread [Next in Thread>