--- SADAHIRO Tomoyuki <bqw10602(_at_)nifty(_dot_)com> wrote:
On Mon, 3 Oct 2005 07:13:15 -0700 (PDT), rajarshi
das <dazio_r(_at_)yahoo(_dot_)com> wrote
Hi,
The following unicode folding test fails on EBCDIC
(perl-5.8.6) :
$a = '0178';
$b = '00FF';
$a1 = pack("U0U*", hex $code);
$b1 = pack("U0U*", map { hex } split " ",
$mapping);
if (":$b1:" =~ /:[$a1]:/i) {
print "ok\n";
}
I guess $code is $a and $mapping is $b...
Alternately, if $a = '0178', and $b = '00DF', the
test
passes.
Why is this so ?
Is it because \xFF as a border case ( 1 less than
256)
is not properly handled ?
0xDF in IBM 1047 or some other EBCDIC encodings
is ÿ (that is y with diaeresis) which corresponds
U+00FF
and its uppercase is U+0178.
How about $a = '039C' and $b = '00A0' or '00B5'?
Here 0xA0 in IBM 1047 is µ (that is MICRO SIGN)
which corresponds U+00B5 and its uppercase is
U+039C.
Does someone have any thoughts on the source of
the
problem ?
Possibly a Unicode code value and a native code
value may be confused.
If the native encoding is EBCDIC, it causes much
trouble
compared with the case of ASCII/latin-1.
Or is the value stored in $b1 generated by
pack("U0U*", map { hex }
split " ", '00FF') really a representation of
U+00FF?
use Devel::Peek and what is output from
Devel::Peek::Dump($b1)?
## example of usage of Devel::Peek ##
use Devel::Peek;
$b1 = pack("U0U*", map { hex } split " ", '00FF');
Dump($b1);
## example of output from Devel::Peek::Dump ##
SV = PV(0x36572c) at 0x182c96c
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x36d9b4 "\303\277"\0 [UTF8 "\x{ff}"]
CUR = 2
LEN = 4
where PV stands for string and "\303\277" is U+00FF
in UTF-8.
In UTF-EBCDIC, the output should be different.
Following is the output in UTF-EBCDIC for Dump($b1) :
SV = PV(0x20db050c) at 0x20dcaf9c
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x20db5a70 "\213\163"\0 [UTF8 "\x{df}"]
CUR = 2
LEN = 3
Rajarshi.
regards,
SADAHIRO Tomoyuki
__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/