Larry Wall wrote on 1999-12-18 21:01 UTC:
Markus Kuhn writes:
: Linux (like Unix) does not provide any per-file or
: per-syscall tagging of character sets and instead the preferred system
: character set can be specified per process using LC_CTYPE.
I'm still wondering if this will be a problem for setuid programs.
Environment variables can't be trusted.
The author of a setuid program has to be a bit careful here. The safety
rules are roughly the following here:
- Any input that is provided by the calling user or sent to her
is processed under the LC_CTYPE that she has specified. Rationale:
This I/O is under the full control of the user, so she
can recode anything in there anyway. Therefore, we can let her
have full control over how her I/O is recoded into UTF-8 by Perl
internally.
- Any input/output that is provided from/to the system administrator for/by
a setuid program is interpreted under a fixed locale specified by the
system administrator at install time. For instance, passwd has to know,
by configuration, what encoding is used in /etc/passwd. How data
read from /etc/passwd is recoded into UTF-8 internally by Perl must
definitely not depend on the LC_CTYPE value provided by the calling
caller.
- For other programs called by a setuid program, the programmer will have
to decide on a case-by-case basis, whether the user or the installer
provided locale shall be used. This determines, what I/O reencoding
to/from these programmes will be applied and what locale variables will
be exported. This decision will depend mostly on how locale
sensitive the called program is, whether is will see user data
directly or reencoded, and whether the called program can be
trusted to handle and locale that the user might have specified
correctly.
In general, I hope and suggest that the really security critical
characters (shell quotes, etc.) are all only ASCII characters and
that the installed locales should only support ASCII compatible
encodings such as ISO 8859, UTF-8, EUC, etc., but NOT Shift-JIS,
national ISO 646 variants, etc. This should greatly reduce the
risks that we know from IFS, etc. where the semantics of ASCII
characters was changeable.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>