Re: [Nmh-workers] I like neither green eggs and ham nor MIME

Hi,

Norm wrote:

I am not at all secure about how the standard GNU utilities will
handle non-ascii characters. For example, 'wc -c', just counts
bytes.


Christian has pointed out -c has remained bytes, --bytes is a synonym,
because otherwise too many things would break, and that -m has been
added to handle multi-byte characters, AKA --chars.  tr(1) remains
resolutely single bytes, though the documentation talks of growing
multibyte support with a -C complement option.

    $ od -c <<<←
    0000000 342 206 220  \n
    0000004
    $ 
    $ tr \\220 \\221 <<<←
    ↑
    $

Things like sed and grep all work in a UTF-8 world just fine, though
often a bit more slowly, Unix having moved to it some years ago.

    $ sed 'y/\220/\221/' <<<←
    ←
    $ sed y/←/x/ <<<←
    x
    $

For the odd occasion when I want to remove locale specifics, I use
~/bin/C as a shorthand.

    $ cat ~/bin/C
    #! /bin/sh

    # LC_ALL has precedence over LANG according to POSIX[1], but we may as
    # well stamp out any traces by setting LANG too.
    # 1.  The Open Group Base Specifications, Ch. 8 Environment Variables.

    LC_ALL=C LANG=C exec "$@"
    $
    $ C sed 'y/←/x/' <<<←
    sed: -e expression #1, char 8: strings for `y' command are different lengths
    $ C sed 'y/←/xyz/' <<<←
    xyz
    $ 

Ken wrote:

But since UTF-8 has the excellent property that non-ASCII characters
look like just 8-bit characters but won't ever be mistaken for ASCII
(not a surprise, since it was designed by two of the original Unix
geeks)


Ken Thompson and Rob Pike.  (Pike's not quite original, but nearly.)
Rob covered its creation in a diner on a napkin back in 2012.
https://plus.google.com/+RobPikeTheHuman/posts/Rz1udTvtiMg
There's a comment by me there with a Google Streetview of the diner.

I jumped whole-hog into UTF-8 a few years ago, and I haven't regretted
it one bit.


No regrets here.  You might find iconv(1) useful to convert existing
files from one encoding to another.

Cheers, Ralph.

_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [Nmh-workers] I like neither green eggs and ham nor MIME, (continued) Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Michael Richardson Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Chad Brown Re: [Nmh-workers] I like neither green eggs and ham nor MIME, norm Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Ken Hornstein Re: [Nmh-workers] I like neither green eggs and ham nor MIME, norm Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Ken Hornstein Re: [Nmh-workers] I like neither green eggs and ham nor MIME, norm Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Ken Hornstein Re: [Nmh-workers] I like neither green eggs and ham nor MIME, norm Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Ken Hornstein Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Ralph Corderoy <= Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Christian Neukirchen Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Paul Fox

Previous by Date:	Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Christian Neukirchen
Next by Date:	Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Chad Brown
Previous by Thread:	Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Ken Hornstein
Next by Thread:	Re: [Nmh-workers] I like neither green eggs and ham nor MIME, Christian Neukirchen
Indexes:	[Date] [Thread] [Top] [All Lists]