perl-unicode

Re: Converting string to UTF-16LE

2004-03-02 18:30:07
Offhand (and I'm just guessing here from the contents of the hashes),
somebody has overgeneralized somewhere, and applied language-specific
tranformations when they're not desired, with the result that utf8
strings have to be prepared to change lengths at various times.  And
changing string lengths is always going to slow you down compared to
doing things in place.

Well, that somebody has probably been me... but I cannot think of any
"language-specific transformations" I would have been thinking of, nor
any "prepared to change lengths" thinking either.  Maybe I just wasn't
thinking, period... I'll take a look at the bug report.

Anyway, sounds to me like someone has mixed Level 3 support into levels
1 and 2. If that's the case, I think it's a fundamental mistake. Perl 5
should pick a level to default to, and stick with it.

Ahhh.  Now I think I know what you are thinking of.

If I can recall correctly, the case tables were in response to the Unicode
CaseFolding table (lib/unicore/CaseFolding.txt) which does indeed define
language-independent foldings that more complex than usual (mostly caused by encoding irregularities in Unicode) Maybe just the placement of those
tables is wrong.  Again, there is nothing language-specific in those.
I'll have a look.

--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen