perl-unicode

Re: perlunicode comment - when Unicode does not happen

2003-12-28 09:30:04
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
   Let's not 'fix' it (not carve it on a stone), but offer a few
well-thought-out options. For instance, Perl may offer (not that these
are particularly well-thought-out) 'just treat this as a sequence of
octets', 'locale', and 'unicode'. 'locale' on Unix means multibyte
encoding returned by  nl_langinfo(CODESET) or equivalent.  On Windows,
it's whatever 'A' APIs accept or is returned by ACP_??().  'unicode'
is utf8 on Unix-like OS, BeOS and 'utf-16(le)' on Windows.

Something like that could work, yes.

Agreed.


creating files with UTF-8 names while still using en_GB.ISO-8859-1
locale. Why does Perl have to be held responsible for your intentional 
act
that is bound to break things?

Whoa!  It's the other way round here.  Nick is using a locale that suits
him for other reasons (e.g. getting time and data formats in proper 
British
ways), but why should he be constrained not to use for his filenames 
whatever
he wants?

I was at least partly being a devil's (UTF-8) advocate anyway, and to that 
end Jungshik Shin's intervention saying use a UTF-8 locale is positive.
When I want non-ASCII it is for one of the following:
  For phonetics for the speech synthesis stuff
  To represent Euro currency symbol
  To typeset mother-in-law's welsh poetry 
  cross-references for Japanese customers of day job

There is no "locale" for phonetics, there is for Euro issues of course,
but setting my locale to "cy_GB" so I can name file by poem is going
to render dates and the like opaque to me the user, likewise 
for Japanese. So for _my_ use UTF-8 is what I want - but I _don't_ want 
some locale derived multi-byte guess. Unicode suits me.



  Well, actually, if your WinXP file system has only characters covered
by Windows-1252,

Well AFAIK there isn't a Windows code page that covers welsh accented 
characters (and certainly not if you mix in phonetics). 
The shared drives at work I mount have user's which are native speakers
of not only English, Italian, Norwegian, Swedish, but also two kinds of 
Chinese, and various Indian languages - and we have Japanese customers, 
so even in a small English startup cp1252 does not give them 
all the freedom to give files natural names.


And how would Nick know that, or he could he guarantee that, if the 
Windows
share is in multiuser use?

PLEASE, PEOPLE: stop thinking of this in terms of an environment 
controlled
solely by one user.

Exactly - a file system should be able to cope even if files 
are named in english, welsh, chinese, ...

So IMHO perl's -d etc. should be helping the move to Unicode not 
pandering to multi-byte compromises. I have no objection to some 
way to name files in shift-jis if that has been done, but I hope for
a to-become common practice of "unicode" 


<Prev in Thread] Current Thread [Next in Thread>