nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] sbr/fmt_scan.c: cpstripped(): assert(w >= 0) Failure.

2017-04-26 04:20:35
Hi David,

Your patch looks good.

I've learnt how to only push the first few commits too.  :-)

I noticed that the altstr = NULL; in cptrimmed(), line 197, is inside
the w >=0 block.  I think it should be outside.

The else block for that condition effectively returns so the un-NULL'd
altstr isn't used again.  (They should be swapped.  :-)

Taking the sane ASCII → ISO-8859-1 → UTF-8 route, I avoided all this
libc wide-character stuff, so perhaps I'm misunderstanding the code.
Here's the no MULTIBYTE_SUPPORT version.

    static void
    cpstripped (charstring_t dest, size_t max, char *str)
    {
        int prevCtrl = 1;   /* This is 1 so we strip out leading spaces */
        int len;
                
        if (!str)
            return;

        len = strlen(str);
        while (*str != '\0' && len > 0 && max > 0) {
            int c = (unsigned char) *str;
            len--;
            if (iscntrl(c) || isspace(c)) {
                str++;
                if (! prevCtrl) {
                    charstring_push_back (dest, ' ');
                    --max;
                }
                prevCtrl = 1;
                continue;
            }

            prevCtrl = 0;
            charstring_push_back (dest, *str++);
            --max;
        }
    }

Could be clearer, but it's probably like this so the MULTIBYTE_SUPPORT
version can interleave.  Perhaps it's the whole function that should be
#ifdef'd rather than snippets?

Now the MULTIBYTE_SUPPORT cut with my questions.

    static void
    cpstripped (charstring_t dest, size_t max, char *str)
    {
        int prevCtrl = 1;   /* This is 1 so we strip out leading spaces */
        int len;
        int char_len, w;
        wchar_t wide_char;
        char *altstr = NULL;

        if (!str)
            return;

        len = strlen(str);

        if (mbtowc(NULL, NULL, 0)) {}  /* Reset shift state */

        while (*str != '\0' && len > 0 && max > 0) {
            char_len = mbtowc(&wide_char, str, len);

            /*
             * If mbrtowc() failed, then we have a character that isn't valid

The comment talks of mbrtowc(), but the code uses mbtowc() all three
times.  This was a bit confusing because I used `K' in vim to read the
comment's man page.  :-)

             * in the current encoding, or len wasn't enough for the whole
             * multi-byte rune to be read.  Replace it with a '?'.  We do that 
by
             * setting the alstr variable to the value of the replacement 
string;
             * altstr is used below when the bytes are copied into the output
             * buffer.
             */
            if (char_len < 0) {
                altstr = "?";
                char_len = mbtowc(&wide_char, altstr, 1);
            }

When the previous mbtowc() returned negative, could its internal shift
state have been modified?  Can that mean it can't parse "?" and we
should reset it before trying?  Or that it parses "?" as something other
than question mark?  mbrtowc(3), the function we're not using, does
suggest this can happen, and my reading of mbtowc(3), non-`r', is that
resetting its state will indicate if it had some so I think this is a
bug.

            if (char_len <= 0) {
                break;
            }
            len -= char_len;

            if (iswcntrl(wide_char) || iswspace(wide_char)) {
                str += char_len;
                if (! prevCtrl) {
                    charstring_push_back (dest, ' ');
                    --max;
                }

                prevCtrl = 1;
                continue;
            }

            prevCtrl = 0;

            w = wcwidth(wide_char);
            assert(w >= 0);
            if (max >= (size_t) w) {
                charstring_push_back_chars (dest, altstr ? altstr : str, 
char_len, w);
                max -= w;
                str += char_len;

If we're using altstr's "?" then str, that started with something
mbtowc() couldn't handle, is bumped on by one.  That seems reasonable,
but again suggests we should reset its state?

                altstr = NULL;
            } else {
                /* Not enough width available for the last character.  Output
                   space(s) to fill. */
                while (max-- > 0) {
                    charstring_push_back (dest, ' ');
                }
                break;
            }
        }
    }

Can we axe the MULTIBYTE_SUPPORT configure test for <wchar_t.h>,
<wctype.h>, mbtowc(), and wcwidth() as they're all POSIX?  Removing the
#else code would ease reading.

Also, altstr is misspelled as alstr in the commend in cpstripped().

Have added a to-do item to spell-check comments, with that as a sample
to test my shell skills.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>