perl-unicode

Re: perlunicode comment - when Unicode does not happen

2003-12-23 05:30:05
Ed Batutis <ed(_at_)batutis(_dot_)com> writes:
"Jarkko Hietaniemi" <jhi(_at_)iki(_dot_)fi> wrote in message
news:0C06A42A-34CE-11D8-A034-00039362CB92(_at_)iki(_dot_)fi(_dot_)(_dot_)(_dot_)

You do know that ...
Yes.

If wctomb or mbtowc are to be used, then Perl's Unicode must be converted
either to the locale's wide char or to its multibyte. 

Locale is per-user - file systems on Unix are multi-user and there is 
no meta-data to say which locale a user was in when they wrote the file.

Many locale() C libraries don't give access to what the encoding
_is_ - and when they do there is next to no standardization of the 
names used. 

We tried using locale() it really didn't help...

This isn't trivial,
but Mozilla solved this same problem. It can portably work. (Are you
listening Brian Stell!). It wasn't easy for them, but they did it.

I don't see him in the copy list so he probably isn't.
Mozilla is a (suite of) applications. Perl is a programming environment.
Mozilla has to solve the "problem", perl has to make it possible for 
the programmer to solve the problem. We believe we have.


Imagine ...

I don't have to imagine. But I think that where a Perl script opens its
files is its own business. 

Quite - and it up to the script/program to deal with this issue.

if (-d encode('sjis',$path)) # or whatever




Here's my dilemma: utf-8 doesn't work as an argument to -d and neither does
Shift-JIS (at least with certain Shift-JIS characters). 

If there is a bug which prevents you passing what your system requires
then set this out clearly as a bug report, via perlbug or some other 
mechanism which gives us details of your perl (perl -V etc.)

I suspect that the Win32 '\\' issue will need help from ActiveState
as the defacto Win32 porting experts. If a trailing '\\' has to be stripped
because OS objects then that isn't really perl's problem.

We may reach the point where it makes sense to have a pragma 
which enables auto encode/decode of args to system calls, but 
I don't think we understand common practice (or that such practices 
are even established yet) well enough to specify that yet.




<Prev in Thread] Current Thread [Next in Thread>