processing utf-8 data

Hi all,
 is it possible to smart process utf-8 encoded text data? I need to do
somenthing like:
- split text to words
- remove illegal characters for specified language
- remove control characters
- ...

Which module I need to use? There is a lot of modules for charset
conversion. I found Unicode::String to be usefull, but from latin*
encodings support only latin1.

How I can prevent false matching using regular expressions if working
with multibyte characters?

-- 
 best regards
  Ing. Roman Vasicek

 software developer
+----------------------------------------------------------------------------+
 PetaMem s.r.o., Drahobejlova 27/1019, 190 00 Praha 9 - Liben, Czech republic
 http://www.petamem.com/

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Practical problems with custom .ucm based encoding, Dan Kogai

Next by Date:

Re: Practical problems with custom .ucm based encoding, Nick Ing-Simmons

Previous by Thread:

Practical problems with custom .ucm based encoding, Bart Schuller

Next by Thread:

21st Unicode Conference, May 2002, Dublin, Ireland -- Just 2 weeks to go!, Misha . Wolf

Indexes:

[Date] [Thread] [Top] [All Lists]