perl-unicode

Re: Am I correct in thinking that the only way to get ord() to return a value over 256 is to send the character as a Unicode string instead of a byte string?

2010-10-29 08:19:21
On Oct 29, 2010, at 2:30 AM, Aristotle Pagaltzis wrote:

* Dan Muey <dan(_at_)cpanel(_dot_)net> [2010-10-28 21:55]:
For example, note the differences in output between a unicode
string and a byte string regarding character 257, as a unicode
string it is 257, as a byte string it is 196.

That is not what’s going on.

   $ perl -E'say ord "1234"'
   49

When you pass a multi-character string to `ord`, you get the code
point of the first character.

Thank you for clarifying what I was highlighting. 

You are missing the rest of the bytes from the UTF-8 encoding.

You are losing data.

Thanks, I do understand that and appreciate you expounding it for me further. 
Allow me to explain why this question came up:

I am using Scalar::Quote on byte strings and it uses ord() to determine if it 
will use byte string grapheme notation (e.g. \xE3\x8A\xB7) or unicode string 
notation (e.g. \x{32B7}).

multivac:~ dmuey$ perl -MScalar::Quote=Q -E 'say Q("Perl is the ㊷™");'
"Perl is the \xe3\x8a\xb7\xe2\x84\xa2"
multivac:~ dmuey$ 

multivac:~ dmuey$ perl -E 'say "Perl is the \xe3\x8a\xb7\xe2\x84\xa2";'
Perl is the ㊷™
multivac:~ dmuey$

It appears to do what I need assuming 2 things:
 a) the string is a byte string 
     (e.g. perl -MScalar::Quote=Q -E 'say Q("Perl is the \x{32b7}\x{2122}");')
 b) we are not under "use utf8"
     (e.g. perl -MScalar::Quote=Q -E 'use utf8; say Q("Perl is the ㊷™");')

 I just wanted to verify that it's use of ord() in it's logic wouldn't 
unexpectedly  result in me getting back \x{32B7} under some weird circumstance 
I overlooked.

Thanks again, everyone. I really appreciate it!

--
Dan Muey