perl-unicode

RE: Pattern matching with Unicode (5.6.1)

2002-08-15 14:30:04
I'm having a bit of a problem getting Unicode pattern 
matching to do what I would like it to.

I guess my question wasn't entirely clear. I'm reading in the attatched
file and trying to split it on "\n\n".

When I'm looping over the file,

I've (sort of) made it work by doing:

 # strip BOM and trailing nulls and carriage returns
 s/^..// if $. == 1 and s/\0//g;
 s/[\0\r]//g;

The two-byte BOM has me thinking it's probably UTF-16. Is there an easy
way to tell what encoding a file uses?

But I'm sure there must be a more elegant way to do this. 
Honestly, I'm not even sure where to start. Any ideas?

Thanks a bunch,

 -dave

Attachment: unicode.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>