nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] char args passed to isalpha() et al.

2007-04-11 03:32:09
Joel Reicher wrote:
isalpha() will cope with a -ve int if EOF is such a number. That's
why it takes an int to begin with.

The C standard says:
# In all cases the argument is an int, the value of which shall
# be representable as an unsigned char or shall equal the value of
# the macro EOF. If the argument has any other value, the behaviour
# is undefined.

If the original char is -ve, that's a slightly different problem. More
below.

So the cleanest fix is probably to make sure that we're using 'unsigned
char *' in all the places where it matters.  The simplest fix is to use
  isalpha((unsigned char)*p)
(ie cast to unsigned char at point of use).

Casting to int will kill the warning but leave the problem in place, so
it's the wrong approach.

A conversion from char to int is not the same as going via unsigned char.
The former does a sign extension, the latter does not (on my machine,
anyway).

Yes, exactly. Going from char to int is wrong. For correct behaviour
we must go via unsigned char so that we don't do the sign extension.

The nmh code must, at the moment, be assuming that these chars are 7 bit
because it's doing a direct (implicit) conversion from char to int.

Quite likely. That may have been true when the code was written but
it certainly isn't unlikely that we'll run into an 8th-bit-set char
these days...

Since conversion via unsigned char is not bug-for-bug identical, I don't
want to drop that in and introduce new behaviour.

The whole point is that we want to introduce new, bugfixed behaviour.
(It's only new behaviour on systems where char is signed, of course).

If I were going to change everying to unsigned char, and I agree that's
the right solution, it wouldn't be as a cast but as a change in everything
going back to the point at which the data is read in.

I'm quite prepared to do that if everyone likes the sound of it.

I'm not totally sure it's worth the effort. I do think that we should
either do that, or do the cast to (unsigned char) where we're using
it in these isalpha() etc functions. Anything else is leaving the code
buggy.

-- PMM


_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
http://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>