Re: Converting string to UTF-16LE

Larry Wall <larry(_at_)wall(_dot_)org> writes:
On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
: For this example the search value will be "Iba�ez". Because of thesearch: isn't case-sensitive, all letters should be uppercased, using theuc method.
I don't think this is your problem, but in general I think it's better
to canonicalize with lc() because it will try to undo both uppercase
and titlecase.
Since you are here ;-)

Why does ñ not uppercase to Ñ ?

I am no Larry but I think I can answer this-- it is the old mess of8-bit versusUnicode. In the old world of 8-bit codepages the ñ upcases to Ñ onlyif the toupper()says so, which normally needs a "use locale" somewhere, and even thenit doesn't workunless your locale as defined by your vendor says so. In the new worldof Unicode theñ upcases to Ñ if the string is Unicode. For example this works for mein a UTF-8

terminal window:

$ perl -CO -le '$a=chr(0xD1).chr(256);$b=uc($a);print $b'
ÑĀ

I believe that as soon as the IO stream from where Ibañez is comingfrom is marked

UTF-8, the ñ will upcase as expected.

Which bits of which Unicode.org files are used by uc()?


pp_uc -> to_utf8_upper -> to_utf8_case which uses lib/unicore/To/Foo.pl,
which have been created from the UnicodeData.txt.

--

Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is thisspecial

biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Converting string to UTF-16LE, (continued) Re: Converting string to UTF-16LE, Robert Allerstorfer Re: Converting string to UTF-16LE, Larry Wall Re: Converting string to UTF-16LE, Jarkko Hietaniemi Re: Converting string to UTF-16LE, Jarkko Hietaniemi Re: Converting string to UTF-16LE, Larry Wall Re: Converting string to UTF-16LE, Jarkko Hietaniemi Re: Converting string to UTF-16LE, Jarkko Hietaniemi Re: Converting string to UTF-16LE, Larry Wall Re: Converting string to UTF-16LE, Jarkko Hietaniemi Re: Converting string to UTF-16LE, Larry Wall Re: Converting string to UTF-16LE, Jarkko Hietaniemi <=