On Thu, 25 Dec 2003, Jarkko Hietaniemi wrote:
Whoa! It's the other way round here. Nick is using a locale that
suits him for other reasons (e.g. getting time and data formats in
proper British ways), but why should he be constrained not to use for his
filenames whatever he wants?
Then, he should switch to en_GB.UTF-8.
That will work if there's en_GB.UTF-8 available for him in his
particular Unixes and assuming using UTF-8 locales won't break other
things.
IIRC, he explicitly mentioned 'Linux' in his message. Besides,
Solaris, Compaq Tru64, AIX, and HP/UX [1] have all supported UTF-8 locales
for a 'long' time (some of them far far longer than Linux/glibc has). In
the past, all the locales don't come free, but these days, they all come
with no extra charge so that it depends on the 'will'/'policy' of the
system administrators whether that's available or not. Sure, there are a
number of other Unix, old and new, and many old ones don't support UTF-8
locales.
I do want to respect people's wish to make UTF-8 files on their file
systems even if their version of Unix don't support UTF-8 locales.
Otherwise, I wouldn't have come up with a set of 'options' Perl can
offer to them. However, people doing so should be aware that there's
price to pay. For instance, in their shell, file names would not be
shown correctly (i.e. 'ls' would show you garbled characters) They
can't use usual set of Unix tools (e.g. 'find' wouldn't work as intended).
ISO-8859-1, which is why I wrote about mixing up two encodings
in a single file system _under_ his control.
I think we are here talking past each other :-) I'm assuming the
not all file systems (like Samba mounts) are not necessarily under
his control, you are assuming they
Well, I think that's a different story. He explicitly wrote why
he still uses en_GB.ISO-8859-1 (like some old programs breaking under
UTF-8 locale).
Moreover, why would you think that en_GB.UTF-8 locale gives him the
time and date format NOT suitable for him?
I'm not thinking that. What I think his point is is that plain
en_GB.iso88591 is _enough_ for him to get time/date formats etc
working right, but en_GB.UTF-8 brings in _too much_ (such as some
programs not yet being UTF-8 aware enough,
What you had in parentheses was what he wrote in his original message,
but what you wrote didn't sound like that to me. At lesat, you took a
bad example of time/date format.
or him wanting to use iso8859-1 file names in some directories, but in
some directories not).
Yes, that's what I meant. He made a conscious decision to
mix up two encodings (read his message. 'If I want Unicode characters
in file names, I'd just use UTF-8' or something like that), for which
he has to pay whatever price he has to pay. If Perl offers a set of
options as I outlined in my previous message, he has to be careful when
opening files in different directories. For some directories, he has to
use one option while for other directories, he has to use another option.
You're making a mistake of binding locale and encoding.
I'm not-- many UNIX vendors do, and I have to with that fact. If Linux
and glibc are doing the Right Thing, that's marvelous, but not all the
world is Linux and glibc.
I never implied that, let alone saying that. (I always prefer to say
Unix in place of Linux. To me, Linux is just one of many Unix) And,
please check out recent commercial Unix. They DO offer UTF-8 locales as I
wrote above (Solaris and AIX had offered solid UTF-8 locales years before
Linux/Glibc did - actually, when Linux/Glibc 1.x has almost __zero__
locale support, UTF-8 or not). Whether they're installed by the system
admin. is a different story. Anyway, exactly because of the unavailability
of UTF-8 locales for whatever reason, we've been discussing this issue
(to convert Perl's internal Unicode to and from the 'native' encoding
in file I/O.).
The fact that it is on Unix is just an artifact of Unix file system
Not quite. UNIX doesn't care. In traditional UNIX filenames are just
bytes.
You're absolutely right. I didn't mean to say 'file system' there
as I corrected in my subsequent email.
PLEASE, PEOPLE: stop thinking of this in terms of an environment
controlled solely by one user.
Before writing that, please read the man page of 'smbmount' and
'mount' if Linux system is available to you. They're not environment
variables.
Please read my sentence again to see that I had no "variable" in it :-)
Just environment.
OK. Sorry for misreading it. Anyway, Perl can't help resolve that problem.
It can only offer a set of flexible options (as I listed in 'a few
messages ago') that help people solve the problem for themselves.
Jungshik
[1] SGI Irix seems to lag behind in this area. FreeBSD was slow, but
seems to have done a catch-up recently.