perl-unicode

Re: 2 Suprises w/5.8.0

2002-08-01 01:30:05
On Thu, 1 Aug 2002 06:33:07 +0300, Jarkko Hietaniemi 
<jhi(_at_)iki(_dot_)fi> said:

  > Pre-5.8 way of Unicode (or, even worse, pre-5.6 way of Unicode) simply
  > is not compatible, and trying to bridge the gap is probably worse than
  > its worth.

I agree with Jarkko if you write new code. But for old code the answer
must be different.

I guess, Daniel has code that works under pre-5.8 and he now wants to
have it run under 5.8 without breaking the compatibility to previous
perl. The reason is easy to understand: you cannot port code to 5.8
with one strike, it can take several months until you have found all
spots in your code that need some change. So he needs to keep
compatibility with older perl until he can switch to 5.8 safely.

I've ported the code of PAUSE to 5.8.0 within a few hours, but just
yesterday I discovered a missing encode_utf8(). Took me many hours to
find it. I was glad that I could run the whole PAUSE under 5.6.1.

Daniel, if this is the background of your request, I'd say:

- keep using Unicode::String
- keep using the utf8 pragma if 5.6.1 needed it
- don't throw away old code until you feel really safe 
- enclose all changes you try out for 5.8.0 into

    if ($[ > 5.007){
      # code that isn't understood by 5.6.1
    }

- don't hesitate to ask for practical advice on this list.

These are typical changes that you might need:

A filehandle that should read or write UTF-8:

  if ($] > 5.007) {
    binmode $fh, ":utf8";
  }

A scalar that is going to be passed to some extension, be it
Compress::Zlib, Apache::Request or any extension that has no mention
of Unicode in the manpage:

  if ($] > 5.007) {
    require Encode;
    utf8::upgrade($self->{CONTENT}); # make sure it is UTF-8 encoded
    $self->{CONTENT} = Encode::encode_utf8($self->{CONTENT}); # make octets
  }

A scalar we got back from an extension of which we believe it comes
back as UTF-8:

  if ($] > 5.007) {
    require Encode;
    $val = Encode::decode_utf8($val);
  }

Same thing, if you are really sure, it is UTF-8:

  if ($] > 5.007) {
    require Encode;
    Encode::_utf8_on($s);
  }

A wrapper-function for fetchrow_array and fetchrow_hashref when the
database contains only UTF-8:

  sub fetchrow {
    my($self,$sth,$what) = @_; # $what is one of fetchrow_{array,hashref}
    if ($] < 5.007) {
      return $sth->$what;
    } else {
      require Encode;
      if (wantarray) {
        my @arr = $sth->$what;
        for (@arr) {
          defined && /[^\000-\177]/ && Encode::_utf8_on($_);
        }
        return @arr;
      } else {
        my $ret = $sth->$what;
        if (ref $ret) {
          for my $k (keys %$ret) {
            defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret->{$k};
          }
          return $ret;
        } else {
          defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret;
          return $ret;
        }
      }
    }
  }


If you have large scalars that you know can only contain ASCII and
might be marked as UTF-8:

  utf8::downgrade($sort) if $] > 5.007;

That's all I needed. You are not alone:-)


-- 
andreas

<Prev in Thread] Current Thread [Next in Thread>