Re: Detecting/Decoding Unicode Text


On Apr 7, 2004, at 9:13 AM, SADAHIRO Tomoyuki wrote:


On Tue, 6 Apr 2004 18:05:32 -0400
gohaku <gohaku(_at_)earthlink(_dot_)net> wrote:

Hi everyone,
I have some ( actually many ) records in a Database that  I want to
"clean"
Some of these records contain Unicode Text ( Mostly East-Asian )

I have tried matching for "\W+" and "\S+" but that is not what I am
looking for because I would like to keep "&" and "-"

Thanks in advance.
-gohaku


Hello. A solution may depend on which contamination
may be mixed in your records.

If contamination is an unassigned code points which shall not be used,
\p{Assigned}+ may be useful.


SADAHIRO Tomoyuki

Not the best solution, but I decided to match for [A-Z] at theBeginning of the Record.

if($title =~ m /^[A-Z]/si)

Previous by Date:	Re: Creating a UTF-8 web page, Nick Ing-Simmons
Next by Date:	Printing a multilanguage text as unicode, Octavian Rasnita
Previous by Thread:	Re: Detecting/Decoding Unicode Text, SADAHIRO Tomoyuki
Next by Thread:	Creating a UTF-8 web page, Octavian Rasnita
Indexes:	[Date] [Thread] [Top] [All Lists]