perl-unicode

[Encode] benchmark: UTF-16BE encoding/decoding

2002-04-07 05:27:31
Good morning.

I know NI-S is very concerned with the performance of UCS-2 (or UTF-16BE for the latest Java; It maps \x{10000} or higher as SOB, oops, surrogate pair) so I decided to benchmark some. My latest (yet to be uploaded) Encode::Unicode includes two implementations. *_classic is the same old substr() based while *_modern unpack()s the source string all at once and handle chars as array (so more memory is needed). While decode() is consistently 30% better, what's more interesting is encode(). The longer the source string the better it performs. This was totally against my first guess , for encode() makes larger and larger array the longer the source string is.
  Whenever in doubt, benchmark.

Dan the Encode Maintainer.

range HIGH means that source string cosists of \x{10000} or higher

---- encode length=256/range=BMP ----
         Rate Classic  Modern
Classic 275/s      --    -53%
Modern  583/s    112%      --
---- decode length=256/range=BMP ----
         Rate Classic  Modern
Classic 391/s      --    -23%
Modern  508/s     30%      --
---- encode length=256/range=HIGH ----
         Rate Classic  Modern
Classic 286/s      --    -52%
Modern  598/s    109%      --
---- decode length=256/range=HIGH ----
         Rate Classic  Modern
Classic 391/s      --    -23%
Modern  510/s     30%      --
---- encode length=1024/range=BMP ----
          Rate Classic  Modern
Classic 43.1/s      --    -71%
Modern   151/s    250%      --
---- decode length=1024/range=BMP ----
          Rate Classic  Modern
Classic 99.4/s      --    -24%
Modern   130/s     31%      --
---- encode length=1024/range=HIGH ----
          Rate Classic  Modern
Classic 44.1/s      --    -71%
Modern   152/s    244%      --
---- decode length=1024/range=HIGH ----
          Rate Classic  Modern
Classic 99.6/s      --    -23%
Modern   130/s     31%      --
---- encode length=4096/range=BMP ----
          Rate Classic  Modern
Classic 4.19/s      --    -89%
Modern  36.8/s    778%      --
---- decode length=4096/range=BMP ----
          Rate Classic  Modern
Classic 24.7/s      --    -23%
Modern  32.2/s     30%      --
---- encode length=4096/range=HIGH ----
          Rate Classic  Modern
Classic 4.23/s      --    -89%
Modern  37.0/s    774%      --
---- decode length=4096/range=HIGH ----
          Rate Classic  Modern
Classic 24.7/s      --    -23%
Modern  32.2/s     30%      --

<Prev in Thread] Current Thread [Next in Thread>
  • [Encode] benchmark: UTF-16BE encoding/decoding, Dan Kogai <=