perl-unicode

Re: Weird interaction of ord, split, and substr with UTF-8?

2000-10-31 14:18:33
At 9:01 PM +0100 10/31/00, Andreas J. Koenig wrote:
I'd highly recommend falling back to Unicode::String, there are too
many bugs in all perls since the model was changed from marking code
to marking strings.

This sounds reasonable to me. It was exciting to try, however!

 You do not need UCS-4 for your example, there is
$u->substr and $u->ord!

<thwack> (The sound of my palm hitting my forehead) You mean I should read past the first ten lines of the Unicode::String man page?!? :-) Yep, this looks exactly right. Now let's see if it works with real data. Thanks!

FWIW, I'm writing a program to do domain name preparation, which is being worked on in the IETF's IDN WG. I'm doing lowercasing and checking for prohibited characters myself, and handing off normalization to Martin Dürst's charlint.pl. I'll be making my program public, and will let this list know when I think it is somewhat ready.

--Paul Hoffman

<Prev in Thread] Current Thread [Next in Thread>