perl-unicode

Re: perlunicode comment - when Unicode does not happen

2003-12-23 11:30:05
"Nick Ing-Simmons" <nick(_at_)ing-simmons(_dot_)net> wrote in message
news:20031223121741(_dot_)2857(_dot_)13(_at_)llama(_dot_)ing-simmons(_dot_)net(_dot_)(_dot_)(_dot_)
If there is a bug which prevents you passing what your system requires
then set this out clearly as a bug report, via perlbug or some other
mechanism which gives us details of your perl (perl -V etc.)


The point I'm trying to make (agreeing with most perl 5 porters I suspect)
is that supporting Shift-JIS in Perl5 is hopeless. There is way too much
code (both C and .pm) that is multi-byte ignorant to fix it all. And - even
if one had the time - there's no infrastructure that allows you to fix it
(How do I fix string-handling code when I don't know the encoding?) However,
using UTF-8 would be a great work-around - a new code path that could be
made to work at least for core features like "-d". But, oops, it is a
dead-end since Perl doesn't do anything reasonable with UTF-8 when it makes
a system call. (There's still a ton of other stuff broken, but folks can get
around that later - let's fix the core now.)

We may reach the point where it makes sense to have a pragma
which enables auto encode/decode of args to system calls, but

I'd suggest taking some code from ICU or Mozilla that tries to figure out
what the platform encoding is. Then, Perl can do a utf8/platform encoding
conversion before/after the file-system related calls. In the (many -
although way less popular) cases where the platform-encoding detection code
just can't do it (or Encode doesn't support the answer), Perl just leaves
things the way they are today. A 'use system-encoding "foo";' pragma would
provide an escape hatch. This solution doesn't break anything and makes at
least 90% of the world (reasonably) happy.

I don't think we understand common practice (or that such practices
are even established yet) well enough to specify that yet.

I may be misunderstanding your point, but I don't see "common practice"
bearing on this. UTF-8 in Perl is new - and currently it is dead in the
water for things like "-d" - so why not just fix it.

Regards,

=ED


<Prev in Thread] Current Thread [Next in Thread>