Just a note for those of you busily coding UTF-8 decoders and other
algorithms operating on UTF-8 data (regexp, etc.), or are writing
regression test suites for these:
There is a comprehensive UTF-8 stress test file with pretty much
every conceivable type of malformed UTF-8 sequence available on
UTF-8-test.txt
on
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/
Happy decoder crashing ...
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>