perl-unicode

Re: Converting string to UTF-16LE

2004-03-02 19:30:06
Larry Wall <larry(_at_)wall(_dot_)org> writes:
On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
: For this example the search value will be "Iba�ez". Because of the search : isn't case-sensitive, all letters should be uppercased, using the uc method.

I don't think this is your problem, but in general I think it's better
to canonicalize with lc() because it will try to undo both uppercase
and titlecase.

Since you are here ;-)

Why does ñ not uppercase to Ñ ?

I am no Larry but I think I can answer this-- it is the old mess of 8-bit versus Unicode. In the old world of 8-bit codepages the ñ upcases to Ñ only if the toupper() says so, which normally needs a "use locale" somewhere, and even then it doesn't work unless your locale as defined by your vendor says so. In the new world of Unicode the ñ upcases to Ñ if the string is Unicode. For example this works for me in a UTF-8
terminal window:

$ perl -CO -le '$a=chr(0xD1).chr(256);$b=uc($a);print $b'
ÑĀ

I believe that as soon as the IO stream from where Ibañez is coming from is marked
UTF-8, the ñ will upcase as expected.

Which bits of which Unicode.org files are used by uc()?

pp_uc -> to_utf8_upper -> to_utf8_case which uses lib/unicore/To/Foo.pl,
which have been created from the UnicodeData.txt.

--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen


<Prev in Thread] Current Thread [Next in Thread>