Re: Weird interaction of ord, split, and substr with UTF-8?

At 9:01 PM +0100 10/31/00, Andreas J. Koenig wrote:

I'd highly recommend falling back to Unicode::String, there are too
many bugs in all perls since the model was changed from marking code
to marking strings.


This sounds reasonable to me. It was exciting to try, however!

 You do not need UCS-4 for your example, there is
$u->substr and $u->ord!

<thwack> (The sound of my palm hitting my forehead) You mean I shouldread past the first ten lines of the Unicode::String man page?!? :-)Yep, this looks exactly right. Now let's see if it works with realdata. Thanks!

FWIW, I'm writing a program to do domain name preparation, which isbeing worked on in the IETF's IDN WG. I'm doing lowercasing andchecking for prohibited characters myself, and handing offnormalization to Martin Dürst's charlint.pl. I'll be making myprogram public, and will let this list know when I think it issomewhat ready.


--Paul Hoffman

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Weird interaction of ord, split, and substr with UTF-8?, Andreas J. Koenig

Previous by Thread:

Re: Weird interaction of ord, split, and substr with UTF-8?, Andreas J. Koenig

Indexes:

[Date] [Thread] [Top] [All Lists]