nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] I like neither green eggs and ham nor MIME

2014-07-18 13:40:22
Hi,

Norm wrote:
I am not at all secure about how the standard GNU utilities will
handle non-ascii characters. For example, 'wc -c', just counts
bytes.

Christian has pointed out -c has remained bytes, --bytes is a synonym,
because otherwise too many things would break, and that -m has been
added to handle multi-byte characters, AKA --chars.  tr(1) remains
resolutely single bytes, though the documentation talks of growing
multibyte support with a -C complement option.

    $ od -c <<<←
    0000000 342 206 220  \n
    0000004
    $ 
    $ tr \\220 \\221 <<<←
    ↑
    $

Things like sed and grep all work in a UTF-8 world just fine, though
often a bit more slowly, Unix having moved to it some years ago.

    $ sed 'y/\220/\221/' <<<←
    ←
    $ sed y/←/x/ <<<←
    x
    $

For the odd occasion when I want to remove locale specifics, I use
~/bin/C as a shorthand.

    $ cat ~/bin/C
    #! /bin/sh

    # LC_ALL has precedence over LANG according to POSIX[1], but we may as
    # well stamp out any traces by setting LANG too.
    # 1.  The Open Group Base Specifications, Ch. 8 Environment Variables.

    LC_ALL=C LANG=C exec "$@"
    $
    $ C sed 'y/←/x/' <<<←
    sed: -e expression #1, char 8: strings for `y' command are different lengths
    $ C sed 'y/←/xyz/' <<<←
    xyz
    $ 

Ken wrote:
But since UTF-8 has the excellent property that non-ASCII characters
look like just 8-bit characters but won't ever be mistaken for ASCII
(not a surprise, since it was designed by two of the original Unix
geeks)

Ken Thompson and Rob Pike.  (Pike's not quite original, but nearly.)
Rob covered its creation in a diner on a napkin back in 2012.
https://plus.google.com/+RobPikeTheHuman/posts/Rz1udTvtiMg
There's a comment by me there with a Google Streetview of the diner.

I jumped whole-hog into UTF-8 a few years ago, and I haven't regretted
it one bit.

No regrets here.  You might find iconv(1) useful to convert existing
files from one encoding to another.

Cheers, Ralph.

_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>