Re: UTF-8 (strict) appears borken


I have prepared a bug report, as below.

I don't want to waste everybody's time if this is thought to be afeature...


...so if anyone thinks this is not a bug, please shout (soon).

Thanks,

Chris

-----------------------------------------------------------------
[Please enter your report here]

Encode::encode('UTF-8', $foo) and Encode::decode('UTF-8', $bar) detectthe

Unicode 'non-character' U+FFFF and treat it as an error.

There are 65 other Unicode non-characters:

  U+FFFE
  U+01FFFE, U+02FFFE, U+03FFFE, ... U+10FFFE
  U+01FFFF, U+02FFFF, U+03FFFF, ... U+10FFFF
  U+FDD0..U+FDEF

which one would expect to be treated the same as U+FFFF.

They aren't.  They are accepted as normal characters.

This appears to be a bug.


[Please do not change anything below this line]
-----------------------------------------------------------------



--
Chris Hall               highwayman.com

signature.asc
Description: PGP signature