perl-unicode

Re: Character (or byte?) escapes under utf8 pragma

2010-03-11 06:11:24
Michael Ludwig skribis 2010-03-10 10:34 (+0100):
Okay. Let me try to see if I have understood correctly. Without the utf8
pragma in scope, "so\xa0ein\xa0Käse" with a-Umlaut stored as a sequence
of two bytes in my source code will be stored internally as a sequence
of 12 integers. With the utf8 pragma in scope, only 11 integers.

"so\xa0ein\xa0Käse" must be stored as either:

    l1: 73 6f a0 65 69 6e a0 4b e4 73 65 (UTF8 flag off)

or:

    u8: 73 67 c2-a0 65 69 6e c2-a0 4b c3-a4 73 65 (UTF8 flag on)

Both strings should be semantically equal, and have 11 characters, each
of which has an integer ordinal value.

What happens is the following:

    73 6f a0 65 69 6e a0 4b c3-a4 73 65 (UTF8 flag on)
          l1          l1     u8

This is wrong. It is a bug.
-- 
Met vriendelijke groet, // Kind regards, // Korajn salutojn,

Juerd Waalboer  <juerd(_at_)tnx(_dot_)nl>
TNX

<Prev in Thread] Current Thread [Next in Thread>