nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] I like neither green eggs and ham nor MIME

2014-07-18 08:54:02
I am not at all secure about how the standard GNU utilities will handle
non-ascii characters. For example, 'wc -c', just counts bytes. True,
the man page talks about bytes, not characters, but I am still left
uncomfortable.  Then there are the dozens of bash, python, and perl
scripts that I have accumulated over the years.

My experience has been that a modern system handles 8-bit characters just
fine.

Now, where things get a little tricky is with multibyte character sets
like UTF-8.  Not everyone has broken from the paradigm that 1 byte == 1
character, like you noted (we had to do a bunch of work in the format
engine to fix that).  But since UTF-8 has the excellent property that
non-ASCII characters look like just 8-bit characters but won't ever
be mistaken for ASCII (not a surprise, since it was designed by two
of the original Unix geeks) I haven't come across a program where it
truely breaks.  I don't write in Python, but Perl support for UTF-8 is
excellent and I would be shocked if the situation for Python wasn't the
same.

I jumped whole-hog into UTF-8 a few years ago, and I haven't regretted
it one bit.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>