Michael Ludwig skribis 2010-03-10 10:34 (+0100):
Okay. Let me try to see if I have understood correctly. Without the utf8
pragma in scope, "so\xa0ein\xa0Käse" with a-Umlaut stored as a sequence
of two bytes in my source code will be stored internally as a sequence
of 12 integers. With the utf8 pragma in scope, only 11 integers.
"so\xa0ein\xa0Käse" must be stored as either:
l1: 73 6f a0 65 69 6e a0 4b e4 73 65 (UTF8 flag off)
or:
u8: 73 67 c2-a0 65 69 6e c2-a0 4b c3-a4 73 65 (UTF8 flag on)
Both strings should be semantically equal, and have 11 characters, each
of which has an integer ordinal value.
What happens is the following:
73 6f a0 65 69 6e a0 4b c3-a4 73 65 (UTF8 flag on)
l1 l1 u8
This is wrong. It is a bug.
--
Met vriendelijke groet, // Kind regards, // Korajn salutojn,
Juerd Waalboer <juerd(_at_)tnx(_dot_)nl>
TNX