Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
Let's not 'fix' it (not carve it on a stone), but offer a few
well-thought-out options. For instance, Perl may offer (not that these
are particularly well-thought-out) 'just treat this as a sequence of
octets', 'locale', and 'unicode'. 'locale' on Unix means multibyte
encoding returned by nl_langinfo(CODESET) or equivalent. On Windows,
it's whatever 'A' APIs accept or is returned by ACP_??(). 'unicode'
is utf8 on Unix-like OS, BeOS and 'utf-16(le)' on Windows.
Something like that could work, yes.
Agreed.
creating files with UTF-8 names while still using en_GB.ISO-8859-1
locale. Why does Perl have to be held responsible for your intentional
act
that is bound to break things?
Whoa! It's the other way round here. Nick is using a locale that suits
him for other reasons (e.g. getting time and data formats in proper
British
ways), but why should he be constrained not to use for his filenames
whatever
he wants?
I was at least partly being a devil's (UTF-8) advocate anyway, and to that
end Jungshik Shin's intervention saying use a UTF-8 locale is positive.
When I want non-ASCII it is for one of the following:
For phonetics for the speech synthesis stuff
To represent Euro currency symbol
To typeset mother-in-law's welsh poetry
cross-references for Japanese customers of day job
There is no "locale" for phonetics, there is for Euro issues of course,
but setting my locale to "cy_GB" so I can name file by poem is going
to render dates and the like opaque to me the user, likewise
for Japanese. So for _my_ use UTF-8 is what I want - but I _don't_ want
some locale derived multi-byte guess. Unicode suits me.
Well, actually, if your WinXP file system has only characters covered
by Windows-1252,
Well AFAIK there isn't a Windows code page that covers welsh accented
characters (and certainly not if you mix in phonetics).
The shared drives at work I mount have user's which are native speakers
of not only English, Italian, Norwegian, Swedish, but also two kinds of
Chinese, and various Indian languages - and we have Japanese customers,
so even in a small English startup cp1252 does not give them
all the freedom to give files natural names.
And how would Nick know that, or he could he guarantee that, if the
Windows
share is in multiuser use?
PLEASE, PEOPLE: stop thinking of this in terms of an environment
controlled
solely by one user.
Exactly - a file system should be able to cope even if files
are named in english, welsh, chinese, ...
So IMHO perl's -d etc. should be helping the move to Unicode not
pandering to multi-byte compromises. I have no objection to some
way to name files in shift-jis if that has been done, but I hope for
a to-become common practice of "unicode"