Re: Announce: Perl, Unicode and I18N FAQ

Markus Kuhn writes:
: Linux (like Unix) does not provide any per-file or
: per-syscall tagging of character sets and instead the preferred system
: character set can be specified per process using LC_CTYPE.

I'm still wondering if this will be a problem for setuid programs.
Environment variables can't be trusted.  Perhaps we can sanitize
particular values, but the question is deeper than merely whether we
recognize that LC_CTYPE contains "UTF-8".  The question is whether a
setuid program that currently expects 8859-1 can be tricked into doing
something insecure by setting LC_CTYPE to UTF-8.  I don't know the
answer to that yet.  I can imagine programs that try to weed out "bad"
characters rather than specifying "good" characters could get into
trouble on character classes like whitespace.  So you weed \n out of
filenames.  What will the shell do if it sees a LS or PS character in a
filename instead?  If the shell interprets it as a newline, you've got
the same problem as if you'd let \n through.

Of course, we tell people to specify the "good" characters rather than
the "bad" characters, but they don't always.  And even if they do,
the meaning of character classes like "alphanumeric" changes.  What's
alphanumeric to Perl might be delimiters to a subshell.  It's like the
old IFS attack on the Bourne shell.

Hopefully, at least on Linux, the shells will become UTF-8 aware at
the same time LC_CTYPE is implemented.  But Perl doesn't run only
on Linux...

Larry