perl-unicode

Comprehensive UTF-8 decoder stress test file

2000-09-11 16:12:27
Just a note for those of you busily coding UTF-8 decoders and other
algorithms operating on UTF-8 data (regexp, etc.), or are writing
regression test suites for these:

There is a comprehensive UTF-8 stress test file with pretty much
every conceivable type of malformed UTF-8 sequence available on

    UTF-8-test.txt

on

    http://www.cl.cam.ac.uk/~mgk25/ucs/examples/

Happy decoder crashing ...

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>