perl-unicode

Re: Encode UTF-8 optimizations

2016-08-22 16:20:20
On 08/22/2016 02:47 PM, pali(_at_)cpan(_dot_)org wrote:
> And I think you misunderstand when is_utf8_char_slow() is called.  It is
> called only when the next byte in the input indicates that the only
> legal UTF-8 that might follow would be for a code point that is at least
> U+200000, almost twice as high as the highest legal Unicode code point.
> It is a Perl extension to handle such code points, unlike other
> languages.  But the Perl core is not optimized for them, nor will it be.
>   My point is that is_utf8_char_slow() will only be called in very
> specialized cases, and we need not make those cases have as good a
> performance as normal ones.
In strict mode, there is absolutely no need to call is_utf8_char_slow(). As in 
strict
mode such sequence must be always invalid (it is above last valid Unicode 
character)
This is what I tried to tell.

And currently is_strict_utf8_string_loc() first calls isUTF8_CHAR() (which 
could call
is_utf8_char_slow()) and after that is check for UTF8_IS_SUPER().

I only have time to respond to this portion just now.

The code could be tweaked to call UTF8_IS_SUPER first, but I'm asserting that an optimizing compiler will see that any call to is_utf8_char_slow() is pointless, and will optimize it out.