Re: Announce: Perl, Unicode and I18N FAQ

Larry Wall wrote on 1999-12-18 21:01 UTC:

Markus Kuhn writes:
: Linux (like Unix) does not provide any per-file or
: per-syscall tagging of character sets and instead the preferred system
: character set can be specified per process using LC_CTYPE.

I'm still wondering if this will be a problem for setuid programs.
Environment variables can't be trusted.


The author of a setuid program has to be a bit careful here. The safety
rules are roughly the following here:

  - Any input that is provided by the calling user or sent to her
    is processed under the LC_CTYPE that she has specified. Rationale:
    This I/O is under the full control of the user, so she
    can recode anything in there anyway. Therefore, we can let her
    have full control over how her I/O is recoded into UTF-8 by Perl
    internally.

  - Any input/output that is provided from/to the system administrator for/by
    a setuid program is interpreted under a fixed locale specified by the
    system administrator at install time. For instance, passwd has to know,
    by configuration, what encoding is used in /etc/passwd. How data
    read from /etc/passwd is recoded into UTF-8 internally by Perl must
    definitely not depend on the LC_CTYPE value provided by the calling
    caller.

  - For other programs called by a setuid program, the programmer will have
    to decide on a case-by-case basis, whether the user or the installer
    provided locale shall be used. This determines, what I/O reencoding
    to/from these programmes will be applied and what locale variables will
    be exported. This decision will depend mostly on how locale
    sensitive the called program is, whether is will see user data
    directly or reencoded, and whether the called program can be
    trusted to handle and locale that the user might have specified
    correctly.

In general, I hope and suggest that the really security critical
characters (shell quotes, etc.) are all only ASCII characters and
that the installed locales should only support ASCII compatible
encodings such as ISO 8859, UTF-8, EUC, etc., but NOT Shift-JIS,
national ISO 646 variants, etc. This should greatly reduce the
risks that we know from IFS, etc. where the semantics of ASCII
characters was changeable.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>