On Sat, 3 Nov 2001, Eli Zaretskii wrote:
ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO-4.html
This is still silent about Grep, Sort, and tr, which are
the utilities where the non-ASCII support should be a non-trivial
change.
Basically, even after reading that page (which told me something I
didn't know in some cases), Unicode support in basic development
tools is still very much rudimentary.
In practice, Perl has long ago replaced grep, sort, tr, awk, for all but
sentimental reasons. Most of these little silly things were written as
inefficient separate C processes before 1975 for the sole reason that the
PDP-11 that Ritchie and Thompson used had only 64 kB RAM and couldn't
handle any larger multi-function tools:
http://www.bell-labs.com/history/unix/
http://www.bell-labs.com/history/unix/firstport.html
Today, these tiny tools mostly lead people to write extremely inefficient
shell scripts that spend 90% of their time in fork().
UTF-8 support for Perl is in an advanced state, and for some more
experienced UTF-8 users, "grep", "sort", "tr", etc. are merely convenient
and nostalgic shell functions or scripts that call perl to do the job.
[I sometimes wish, we could give up the classic Bourne-style shell with
it's baroque Algol-inspired syntax entirely and that perl had the few
facilities (e.g., prompts, readline-history, compact
command-invocation/argv/piping/redirecting notation, etc.) that are still
missing before we can turn it into the main command-line shell.]
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>