Finally: The Unicode 3.0.1 standard changes the definition of UTF-8 such
that overlong sequences must be signalled as an error condition by a
conforming decoder, which is what we had recommended anyway for a long
time for security reasons:
http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html
Please check all your decoders. Test cases are on:
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>