On Oct 29, 2010, at 2:30 AM, Aristotle Pagaltzis wrote:
* Dan Muey <dan(_at_)cpanel(_dot_)net> [2010-10-28 21:55]:
For example, note the differences in output between a unicode
string and a byte string regarding character 257, as a unicode
string it is 257, as a byte string it is 196.
That is not what’s going on.
$ perl -E'say ord "1234"'
49
When you pass a multi-character string to `ord`, you get the code
point of the first character.
Thank you for clarifying what I was highlighting.
You are missing the rest of the bytes from the UTF-8 encoding.
You are losing data.
Thanks, I do understand that and appreciate you expounding it for me further.
Allow me to explain why this question came up:
I am using Scalar::Quote on byte strings and it uses ord() to determine if it
will use byte string grapheme notation (e.g. \xE3\x8A\xB7) or unicode string
notation (e.g. \x{32B7}).
multivac:~ dmuey$ perl -MScalar::Quote=Q -E 'say Q("Perl is the ㊷™");'
"Perl is the \xe3\x8a\xb7\xe2\x84\xa2"
multivac:~ dmuey$
multivac:~ dmuey$ perl -E 'say "Perl is the \xe3\x8a\xb7\xe2\x84\xa2";'
Perl is the ㊷™
multivac:~ dmuey$
It appears to do what I need assuming 2 things:
a) the string is a byte string
(e.g. perl -MScalar::Quote=Q -E 'say Q("Perl is the \x{32b7}\x{2122}");')
b) we are not under "use utf8"
(e.g. perl -MScalar::Quote=Q -E 'use utf8; say Q("Perl is the ㊷™");')
I just wanted to verify that it's use of ord() in it's logic wouldn't
unexpectedly result in me getting back \x{32B7} under some weird circumstance
I overlooked.
Thanks again, everyone. I really appreciate it!
--
Dan Muey