Encode UTF-8 optimizations

Hi! As we know utf8::encode() does not provide correct UTF-8 encoding
and Encode::encode("UTF-8", ...) should be used instead. Also opening
file should be done by :encoding(UTF-8) layer instead :utf8.

But UTF-8 strict implementation in Encode module is horrible slow when
comparing to utf8::encode(). It is implemented in Encode.xs file and for
benchmarking can be this XS implementation called directly by:

  use Encode;
  my $output = Encode::utf8::encode_xs({strict_utf8 => 1}, $input)

(without overhead of Encode module...)

Here are my results on 160 bytes long input string:

  Encode::utf8::encode_xs({strict_utf8 => 1}, ...):  8 wallclock secs ( 8.56 
usr +  0.00 sys =  8.56 CPU) @ 467289.72/s (n=4000000)
  Encode::utf8::encode_xs({strict_utf8 => 0}, ...):  1 wallclock secs ( 1.66 
usr +  0.00 sys =  1.66 CPU) @ 2409638.55/s (n=4000000)
  utf8::encode:  1 wallclock secs ( 0.39 usr +  0.00 sys =  0.39 CPU) @ 
10256410.26/s (n=4000000)

I found two bottle necks (slow sv_catpv* and utf8n_to_uvuni functions)
and did some optimizations. Final results are:

  Encode::utf8::encode_xs({strict_utf8 => 1}, ...):  2 wallclock secs ( 3.27 
usr +  0.00 sys =  3.27 CPU) @ 1223241.59/s (n=4000000)
  Encode::utf8::encode_xs({strict_utf8 => 0}, ...):  1 wallclock secs ( 1.68 
usr +  0.00 sys =  1.68 CPU) @ 2380952.38/s (n=4000000)
  utf8::encode:  1 wallclock secs ( 0.40 usr +  0.00 sys =  0.40 CPU) @ 
10000000.00/s (n=4000000)

Patches are on github at pull request:
https://github.com/dankogai/p5-encode/pull/56

I would like if somebody review my patches and tell if this is the
right way for optimizations...