perl-unicode

Re: Unicode - it ain't just for character encoding/classification any more

1998-10-01 00:39:21

Gurusamy Sarathy writes:
On Wed, 30 Sep 1998 18:38:40 PDT, Larry Wall wrote:
jhi(_at_)iki(_dot_)fi writes:
: http://www.unicode.org/unicode/reports/tr10/index.html

Yow.  Er, is there anyone out there who would be totally and insanely
delighted to work on this?  It looks like an, um, *interesting* problem.
Fortunately it need have no impact on the core, so I feel free to
delegate it.

Looks to me like all this makes C<use locale> a dead-end.  I wonder
where the locale champion has gone a-hiding... ;-)

He's wearing his false Configure moustache and a false French accent.

Yes, this means more overlap with the locale system.  We alread have
the LC_CTYPE (character classes [\w] et al) functionality more or less
covered, the above draft covers large part of the LC_COLLATE (sorting
[cmp]).

The formatting (LC_NUMERIC [CORE::printf], LC_TIME [POSIX::strftime]
(and LC_MONETARY, but Perl doesn't care about money)) and message
catalog (LC_MESSAGES) are still untouched.

I also point out that the draft explicitly mentions that
language-specific sorting is language-specific sorting, the sorting
defined in there is Unicode-specific, character encodings, nothing to
do with languages.  The algorithm in the draft just leaves a gap wide
enough to plug in language-specific ordering.

As a classical (well, I use it all the time) example, German and
Swedish/Finnish cannot be sorted in a same list because their ä and ö
(adiaeresis and odiaeresis) sort to completely different places.
Without knowing the language (which the locale system knows) you just
cannot sort things.  How Unicode defines and uses 'languages' or
whether Unicode has plans for such, I don't know.

 - Sarathy.
   gsar(_at_)engin(_dot_)umich(_dot_)edu
-- 
$jhi++; # http://www.iki.fi/~jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>