Ralph Corderoy <ralph(_at_)inputplus(_dot_)co(_dot_)uk> writes:
U+0081 as 0x81 is ‘is a character representable as an unsigned char’ for
it's a character, U+0081, and unsigned char holds [0, 0x100) so it
suffers no loss of representation as an unsigned char.
Sure, but then what you are feeding the function is *not* UTF8.
UTF8 would require two bytes to represent that code point. What
you're describing is ISO 8859-1, which is a perfectly fine
single-byte encoding, as long as you don't need any characters
outside the common western-European languages.
Or to put it another way: yes, you can claim that only code points
up to U+FF can be passed to these functions, but that hobbles things
to the point where you really shouldn't claim to be Unicode-aware
at all.
I think it's more sensible to consider that per spec, the <ctype.h>
functions can only deal with single-byte encodings; if you want
something more flexible, you have to go to <wctype.h>.
regards, tom lane