Dear PERLists,
I am running Perl 5.8. and trying to filter out some invalid Unicode characters
from Unicoded texts of some South Asian languages. There are 28 such characters
in my data (all control characters):
0x1, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1B, 0x1C,
0x1D, 0x1F, 0x1e, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0xB, 0xC, 0xF, 0xFFFF, 0xe
The data is coded as utf-16 and I want to keep it this way when the invalid
characters are removed. Is there an easy way to do this with Perl while keeping
the textual quality intact? Any advice is welcome. Thanks.
Best,
Richard