nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] A € for your thoughts - should we fix UTF-8 subject output in scan for 1.5?

2012-05-21 10:07:49
Ken Hornstein <kenh(_at_)pobox(_dot_)com> writes:
My question back to you: do the is* functions take bytes, or
characters?  If they take bytes, then I agree with you.  If they
take characters ... well, I'm not sure what is right.

Quoting POSIX:2008, for isalnum and friends:

        The c argument is an int, the value of which the application
        shall ensure is a character representable as an unsigned char or
        equal to the value of the macro EOF. If the argument has any
        other value, the behavior is undefined.

So these functions only work portably in single-byte encodings.
Particular implementations might choose to make them do something useful
for input values above 255, but you couldn't expect that to work
everywhere.  To work portably in UTF8 and other multi-byte encodings,
you have to go over to the wide-character functions in <wctype.h>.

                        regards, tom lane

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>