Help with a regex

Hello,

I have a file that was saved in utf-16 which got converted to
non-unicode and lost several unicode characters in the process
(en-space, thin space, etc). I am now working with a previous version of
this file which is still in utf-16, and I need to search it for all of
the characters which would have been mangled by saving in the
non-unicode format.

I'm pretty sure the regex sort of look like:

 if($line =~ /\x{0x00FF}-\x{0xFFFF}/) {
   # do stuff
 }

But I don't know enough about the hex representation of Unicode to know
what exactly the regex should be.

Thanks in advance,

 -dave g

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Encode::MIME::Header my 2¢, Nick Ing-Simmons

Next by Date:

Re: FW: ISO 8859-11 (Thai) cross-mapping table, Dan Kogai

Previous by Thread:

Re: Encode::MIME::Header my 2¢, Dan Kogai

Next by Thread:

Re: FW: ISO 8859-11 (Thai) cross-mapping table, Dan Kogai

Indexes:

[Date] [Thread] [Top] [All Lists]