In other words, is there any character that will make ord() return over 256
when passed in as a byte string?
For example, note the differences in output between a unicode string and a byte
string regarding character 257, as a unicode string it is 257, as a byte string
it is 196.
$ perl -C6 -le 'print "Character 257 info:";print "\tunicode \\x{} notation: "
. sprintf(q{\x{%x}}, 257);print "\tOutput as Unicode string \x{101}";print
"\tunicode string \\x{} notation ord(): " . ord("\x{101}");print "\tbyte string
grapheme ord(): " . ord "\xc4\x81";print "\tbyte string literal ord(): " . ord
"ā";'
Character 257 info:
unicode \x{} notation: \x{101}
Output as Unicode string ā
unicode string \x{} notation ord(): 257
byte string grapheme ord(): 196
byte string literal ord(): 196
$
The reason this is relevant is that on a given project I am using
byte-strings-only for consistency and some encoders (i.e. Scalar::Quote::Q()
)will change from bytes-string-friendly-grapheme-cluster notation (e.g.
\xE3\x8A\xB7) to unicode-string-notation (e.g. \x{32B7}) and I want to be sure
I always use data that gets me the former rather than the latter :)
TIA!
--
Dan Muey