Good morning.
I know NI-S is very concerned with the performance of UCS-2 (or
UTF-16BE for the latest Java; It maps \x{10000} or higher as SOB,
oops, surrogate pair) so I decided to benchmark some. My latest (yet to
be uploaded) Encode::Unicode includes two implementations. *_classic is
the same old substr() based while *_modern unpack()s the source string
all at once and handle chars as array (so more memory is needed).
While decode() is consistently 30% better, what's more interesting is
encode(). The longer the source string the better it performs. This
was totally against my first guess , for encode() makes larger and
larger array the longer the source string is.
Whenever in doubt, benchmark.
Dan the Encode Maintainer.
range HIGH means that source string cosists of \x{10000} or higher
---- encode length=256/range=BMP ----
Rate Classic Modern
Classic 275/s -- -53%
Modern 583/s 112% --
---- decode length=256/range=BMP ----
Rate Classic Modern
Classic 391/s -- -23%
Modern 508/s 30% --
---- encode length=256/range=HIGH ----
Rate Classic Modern
Classic 286/s -- -52%
Modern 598/s 109% --
---- decode length=256/range=HIGH ----
Rate Classic Modern
Classic 391/s -- -23%
Modern 510/s 30% --
---- encode length=1024/range=BMP ----
Rate Classic Modern
Classic 43.1/s -- -71%
Modern 151/s 250% --
---- decode length=1024/range=BMP ----
Rate Classic Modern
Classic 99.4/s -- -24%
Modern 130/s 31% --
---- encode length=1024/range=HIGH ----
Rate Classic Modern
Classic 44.1/s -- -71%
Modern 152/s 244% --
---- decode length=1024/range=HIGH ----
Rate Classic Modern
Classic 99.6/s -- -23%
Modern 130/s 31% --
---- encode length=4096/range=BMP ----
Rate Classic Modern
Classic 4.19/s -- -89%
Modern 36.8/s 778% --
---- decode length=4096/range=BMP ----
Rate Classic Modern
Classic 24.7/s -- -23%
Modern 32.2/s 30% --
---- encode length=4096/range=HIGH ----
Rate Classic Modern
Classic 4.23/s -- -89%
Modern 37.0/s 774% --
---- decode length=4096/range=HIGH ----
Rate Classic Modern
Classic 24.7/s -- -23%
Modern 32.2/s 30% --