Hi Ken,
(3) assume charset=utf-8 (maybe allow this to be overridden in
profile)
We already do (1) and (2). (3) is the problem. Other people who have
thoughts on this topic are free to weigh in. Personally, I believe
that if you're doing LANG=C, you shouldn't be dealing with any 8-bit
characters at all. Isn't that's what that means?
Agreed. I eventually moved from LC_ALL=C to LANG=en_GB.utf8 and it
isn't too painful these days. GNU grep and others have worked on the
performance hit they had initially and for those times when I do want,
e.g. sort(1), to be in the C locale I use
$ cat ~/bin/C
#! /bin/sh
# LC_ALL has precedence over LANG according to POSIX[1], but we may as
# well stamp out any traces by setting LANG too.
# 1. The Open Group Base Specifications, Ch. 8 Environment Variables.
LC_ALL=C LANG=C exec -- "$@"
$
BTW, WRT spotting multi-byte UTF-8 encoding, I don't think that's a
goer. Valid UTF-8 and valid GB2312 can share the same sequences,
especially if it's just the odd `£' or `拢` in ASCII text.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers