Hi Norm,
So you are saying that "normal unix commands", such as grep, wc, tr
etc, do or someday the GNU versions will, know about UTF-8, at least
for file contents,
Yes, they do, today. And have done for quite a while. You need your
environment variables set up properly so `locale' reports UTF-8 (or
`utf8'). Then...
$ grep -i roman chars
Roman numerals Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ Ⅼ Ⅽ Ⅾ Ⅿ
$ grep £ chars
Currency £ € cent-¢
$ grep -i roman chars | sed -r 's/.*(.)/\1/'
Ⅿ
$ grep -i roman chars | sed -r 's/.*(.)/\1/' | hd
00000000 e2 85 af 0a |....|
00000004
$
if not for file names?
The Unix kernel stores filenames as a run of bytes, not including `/'
and NUL. It places no interpretation on them itself. Userspace is able
to do so, but two users might see different names for the same file just
as they might `see' the same text file differently if they think the
bytes represent different encodings.
$ >pound-£
$ ls
pound-£
$ LC_ALL=C ls
pound-??
$
But really, these days, the whole world is UTF-8. Unless it's Microsoft
with their backwards backwards-compatibility view of the world, and no
one cares about them.
Cheers, Ralph.
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers