Re: Correct use of UTF-8 under Unix

Larry Wall writes:

The problem is not so much files as it is interfaces.  What
percentage of the text you use comes from the system you're on?


Agreed. We have to care about properly interfacing with the systems that
run in other locales and transmit files with different encodings.

What's coming down that socket you just opened?


Most high-level programming languages have a means to associate an encoding
to a character stream. In C++ it's called "getloc()", in Lisp you write

    (socket-connect 80 "www.foo.com" :external-format charset:koi8-r)

And if your "encoding" data type comprises both the character set and the
end-of-line convention, you also implement the "Unicode Newline Guidelines"
with the same mechanism and thus simultaneously get rid of interoperability
problems with MacOS and DOS and its successors.

: (LC_CTYPE, LANG), or via other command line switches.

You have a major showstopper here as far as us Perl folks are
concerned.  Neither the environment nor the command line can be trusted
in a setuid situation.


The LC_CTYPE and LANG environments are made for programs to which a user
interacts directly. A Perl program which does not look at environment
variables and command line is clearly a distinct case; for such a program,
which typically reads its data and commands from a socket, other means will
designate the encoding (for example, MIME like headers).

Around here people will actually shudder if you say "POSIX".


But recall that it is adherence to POSIX standards which has permitted that
so many applications could be ported to Linux.

           Bruno

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Correct use of UTF-8 under Unix, Markus Kuhn

Next by Date:

Re: Use of UTF-8 under Perl and Unix, Bram Moolenaar

Previous by Thread:

Re: Correct use of UTF-8 under Unix, Markus Kuhn

Next by Thread:

RE: Correct use of UTF-8 under Unix, Karlsson Kent - keka

Indexes:

[Date] [Thread] [Top] [All Lists]