perl-unicode

How to handle unicode strings in utf8 and pre-utf8 pragma perls

2003-05-31 01:30:04
Hello

I'd be grateful if someone could help me with this, as I know very little
about Unicode.

I currently have Unicode data stored as bytes, or escaped depending on the
perl version, eg:

  my @day_names;

  if ( $] >= 5.006 )
  {
    @day_names =
    (
      "\x{0414}\x{04af}\x{0439}\x{0448}\x{04e9}\x{043c}\x{0431}\x{04af}",
      "\x{0428}\x{0435}\x{0439}\x{0448}\x{0435}\x{043c}\x{0431}\x{0438}",
      "\x{0428}\x{0430}\x{0440}\x{0448}\x{0435}\x{043c}\x{0431}\x{0438}",
      "\x{0411}\x{0435}\x{0439}\x{0448}\x{0435}\x{043c}\x{0431}\x{0438}",
      "\x{0416}\x{0443}\x{043c}\x{0430}",
      "\x{0418}\x{0448}\x{0435}\x{043c}\x{0431}\x{0438}",
      "\x{0416}\x{0435}\x{043a}\x{0448}\x{0435}\x{043c}\x{0431}\x{0438}"
    );
  }
  else
  {
    @day_names =
    (
      'Дүйшөмбү',
      'Шейшемби',
      'Шаршемби',
      'Бейшемби',
      'Жума',
      'Ишемби',
      'Жекшемби'
    );
  }

What I would really like to do is avoid this duplication by using byte
representations only and flagging them as Unicode if perl 5.006 or better
is used.

Conceptually something like:

  use utf8 if $] >= 5.006;    # Yes, I know this won't even compile in
                              # reality :)

  my @day_names =
  (
    'Дүйшөмбү',
    'Шейшемби',
    'Шаршемби',
    'Бейшемби',
    'Жума',
    'Ишемби',
    'Жекшемби'
  );

Of course that won't work, but that's the kind of thing I'm aiming for.

So, a couple of questions:

1) Does what I'm trying to do make sense?
2) Is there an easy way of doing it?

Any help would be really appreciated.

Cheers,
Rich
-- 
Richard Evans
scriptyrich(_at_)yahoo(_dot_)co(_dot_)uk