perl-unicode

matching multibyte utf-8 in perl

2003-03-01 06:30:06
Jarrko:
I saw your post in the perl unicode developer list:

From: Jarkko Hietaniemi [mailto:jhi(_at_)iki(_dot_)fi] 
Sent: Friday, January 10, 2003 1:39 PM
To: Merijn van den Kroonenberg
Cc: Narins, Josh; perl-unicode(_at_)perl(_dot_)org
Subject: Re: beginniner's 5.6.1 latin1<->utf8 question


On Fri, Jan 10, 2003 at 07:28:00PM +0100, Merijn van den Kroonenberg
wrote:
You might be looking for these:
# ISO 8859-1 to UTF-8
s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
# UTF-8 to ISO 8859-1
s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
I think that will work (they are not mine, so don't blame me if not 
;-)

They are mine :-) so I feel free to say that they don't &#NNN;
conversion...
but they certainly could be changed to work so.


I am a beginner as well, with the task of finding and counting the
non-ascii characters in a utf-8 text. How do I do this?

<Prev in Thread] Current Thread [Next in Thread>