nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] A € for your thoughts - should we fix UTF-8 subject output in scan for 1.5?

2012-05-21 13:55:02
Ken Hornstein <kenh(_at_)pobox(_dot_)com> writes:
So these functions only work portably in single-byte encodings.
Particular implementations might choose to make them do something useful
for input values above 255, but you couldn't expect that to work
everywhere.  To work portably in UTF8 and other multi-byte encodings,
you have to go over to the wide-character functions in <wctype.h>.

Yeah, but the issue isn't about values about 255, it's about values above
127.  Your locale is UTF-8, and you call isspace(0xa0).  Does that mean
"the character 0xa0", which is U+00A0 (a space)?  Or does it mean
one byte of a multibyte character, in which case ... who knows?

Well, I would say that the standard's authors wrote "character" with
malice aforethought, and that what they meant was that the value had to
represent a character, not one byte of a multibyte character.  So if
isspace(0xa0) means anything in UTF8 encoding, it would have to refer
to the Unicode code point U+00A0.  However, in practice I'm not sure
what good it does you to worry about whether or not that works, because
if you want to support anything beyond LATIN1 you need to be using
iswspace() anyway.

                        regards, tom lane

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>