perl-unicode

How to get equivalent of \x{%04X} from Jcode

2004-01-28 16:30:11
Hi,

I'm now done with making regexps for utf8 work, but now I'm trying to
make things backwards compatible, at least to perl5.6.1.

The way I'm doing things right now, I'm constructing regexps on the fly
by doing something like this:

  my $input_in_euc;
  my $re = sprintf( '\x{%04X}', unpack( "U ",
     decode( 'euc-jp', $input_in_euc ) ) );

Now I want to do this using Jcode. I wrote a test script that I run
under perl5.8.3, and I'm finding out that you don't get the same output
for Encode and Jcode... here's my test code:

  use Jcode;
  use Encode;

  my $x = "02";
  my $u = decode('euc-jp', $x);
  my $jc = Jcode->new($x);
  my $uj = $jc->utf8;

  my $ul = length($u);
  my $ujl = $jc->jlength();

  print "ul = $ul, ujl = $ujl\n";

  printf( ("\\x{%04X}" x $ul) . "\n", unpack('U ' x $ul, $u));
  printf( ("\\x{%04X}" x $ujl) . "\n", unpack('U ' x $ujl, $uj));

When I run this I get an output like this:

  ul = 2, ujl = 2
  \x{FF10}\x{FF12}
  \x{00EF}\x{00BC}

I suppose there's some sort of difference in the underlying structure,
but is there a way for me to produce the same behavior for both Encode
an Jcode?

Furthermore, if I want to grab a string, convert to utf8 and then do
matches on it, is it eve possible to do so with perl5.6.1 and Jcode?

TIA,
--d

<Prev in Thread] Current Thread [Next in Thread>
  • How to get equivalent of \x{%04X} from Jcode, Daisuke Maki <=