Larry Wall <larry(_at_)wall(_dot_)org> writes:
On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
: For this example the search value will be "Iba�ez". Because of the
search
: isn't case-sensitive, all letters should be uppercased, using the
uc method.
I don't think this is your problem, but in general I think it's better
to canonicalize with lc() because it will try to undo both uppercase
and titlecase.
Since you are here ;-)
Why does ñ not uppercase to Ñ ?
I am no Larry but I think I can answer this-- it is the old mess of
8-bit versus
Unicode. In the old world of 8-bit codepages the ñ upcases to Ñ only
if the toupper()
says so, which normally needs a "use locale" somewhere, and even then
it doesn't work
unless your locale as defined by your vendor says so. In the new world
of Unicode the
ñ upcases to Ñ if the string is Unicode. For example this works for me
in a UTF-8
terminal window:
$ perl -CO -le '$a=chr(0xD1).chr(256);$b=uc($a);print $b'
ÑĀ
I believe that as soon as the IO stream from where Ibañez is coming
from is marked
UTF-8, the ñ will upcase as expected.
Which bits of which Unicode.org files are used by uc()?
pp_uc -> to_utf8_upper -> to_utf8_case which uses lib/unicore/To/Foo.pl,
which have been created from the UnicodeData.txt.
--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this
special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen