perl-unicode

Re: Am I correct in thinking that the only way to get ord() to return a value over 256 is to send the character as a Unicode string instead of a byte string?

2010-10-28 17:27:42
Dan Muey schrieb am 28.10.2010 um 14:54 (-0500):

Am I correct in thinking that the only way to get ord() to return a
value over 256 is to send the character as a Unicode string instead of
a byte string?

Yes.

In other words, is there any character that will make ord() return
over  256 when passed in as a byte string?

If you pass a character as a byte string, then it's a byte string of 8
bits per byte, and the maximum for a byte is 255.

For example, note the differences in output between a unicode string
and a byte string regarding character 257, as a unicode string it is
257, as a byte string it is 196.

Yes.

  perl -Mutf8 -lwe 'print ord "Я"'  # 1071
  perl        -lwe 'print ord "Я"'  #  208

The reason this is relevant is that on a given project I am using
byte-strings-only for consistency and some encoders (i.e.
Scalar::Quote::Q() )will change from
bytes-string-friendly-grapheme-cluster notation (e.g. \xE3\x8A\xB7)
to unicode-string-notation (e.g. \x{32B7}) and I want to be sure I
always use data that gets me  the former rather than the latter :)

Well, if you don't need character operations, it might work for you.
Make sure to track whether or not your data is already encoded, and also
to use the correct encoding.

-- 
Michael Ludwig