(this only applies for strict UTF-8)
On Monday 22 August 2016 23:19:51 Karl Williamson wrote:
The code could be tweaked to call UTF8_IS_SUPER first, but I'm
asserting that an optimizing compiler will see that any call to
is_utf8_char_slow() is pointless, and will optimize it out.
Such optimization cannot be done and compiler cannot know such thing...
You have this code:
+ const STRLEN char_len = isUTF8_CHAR(x, send);
+
+ if ( UNLIKELY(! char_len)
+ || ( UNLIKELY(isUTF8_POSSIBLY_PROBLEMATIC(*x))
+ && ( UNLIKELY(UTF8_IS_SURROGATE(x, send))
+ || UNLIKELY(UTF8_IS_SUPER(x, send))
+ || UNLIKELY(UTF8_IS_NONCHAR(x, send)))))
+ {
+ *ep = x;
+ return FALSE;
+ }
Here isUTF8_CHAR() macro will call function is_utf8_char_slow() if
condition IS_UTF8_CHAR_FAST(UTF8SKIP(x))) is truth. And because
is_utf8_char_slow() is external library function compiler has absolutely
no idea what that function is doing. In non-functional world such
function could have side effect, etc and compiler really cannot
eliminate that call.
Moving UTF8_IS_SUPER before isUTF8_CHAR maybe could help, but I'm septic
if gcc really can propagate constant from PL_utf8skip[] array back and
prove that IS_UTF8_CHAR_FAST must be always true when UTF8_IS_SUPER is
true too...
Rather add IS_UTF8_CHAR_FAST(UTF8SKIP(s))) check (or similar) before
isUTF8_CHAR() call. That should totally eliminate generating code with
call to is_utf8_char_slow() function.
With UTF8_IS_SUPER there can be branch in binary code which never will
be evaluated.