perl-unicode

Re: Another Unicode s/// buglet?

2002-06-26 10:30:06
On Wed, Jun 26, 2002 at 05:43:07PM +0100, Hugo van der Sanden wrote:
SADAHIRO Tomoyuki <bqw10602(_at_)nifty(_dot_)com> wrote:
:With Perl 5.8.0 RC2 (or plus Change 17353),
:there is something strange.
:
:In $unicode =~ s/$regex/$bytes/,
:$bytes is not upgraded,
:and a malformed Unicode string is generated.
:
:$unicode =~ s/$regex/$bytes/e is ok, though.

As far as I can tell, this is missing code rather than buggy code:
coping with a non-utf8 replacement string does not seem to have
been catered for in this class of cases.

Attached patch passes all existing tests here, as well as some new ones.

Patches passing over the Atlantic... I already patched this with
#17358 (and plugging a leak with #17362).  But I gladly took your new
tests :-)

Due to the current RC status, I've taken the simplest approach I could
see, but there may be higher performance alternatives: the upgrade is
done regardless of whether the replacement string is ever needed, and
since it is not done in place, the upgrade will be repeated each time
it is needed. That means if you expect to perform the same substitution
on many utf8 strings, it would probably be faster if you ensure that
the replacement string is utf8.

+         SV* sv = sv_newmortal();
+         SvSetMagicSV(sv, dstr);

Hmmm, I don't do anything special with magic.

+         sv_utf8_upgrade(sv);
+         c = SvPV(sv, clen);
+         doutf8 = TRUE;
+     }

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>