I was thinking of looking for ANSI sequences and not counting
them. But I don't know if that could get into trouble with
multibyte characters. mbtowc() is too much of a mystery to me.
Well, this is where things get "funky".
In the particular case of UTF-8, the only magical bytes are ones
with the high bit set. For bytes < 128, they are handled "normally".
So assuming you're using the "normal" ANSI escape sequences (and
you're not using 0x9b as a CSI), the multibyte routines will ignore
them.
If you care, what we do in fmt_scan with the multibyte routines is this:
- Use mbtowc to convert a possible multibyte character (example: anything
in UTF-8 U+0080 or greater) into a "wide" character.
- mbtowc() tells us the number of bytes that character consumed. For ASCII,
it's always 1. For UTF-8, sometimes it's > 1. If we don't have enough
room in the buffer for a complete character, we stop.
- We use wcwidth() to see how many columns that character consumes, and
use that to make sure we don't overrun our field width.
- We then copy the bytes over for that character (that we got from mbtowc()).
But it occurs to me that we shouldn't actually do any of this for a "don't
count this" format escape, because that stuff should live outside of
the normal string handling routines in fmt_scan(). Also, I'm with Tom that
I'm not so crazy about putting knowledge of ANSI escape sequences directly
into fmt_scan(), because who knows if your terminal supports them?
David, do you want to implement this?
--Ken
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers