Hello most excellent Unicode list,
I'm having a bit of a problem getting Unicode pattern matching to do
what I would like it to. My code somewhat resembles:
sub parse_doc {
my $file = shift;
my $fh = do { no warnings; local *FH };
open $fh,'<',$file or die "couldn't read [$file]: $!\n";
my $contents = '';
{ local $/ = undef;
$contents = <$fh>; }
close $fh;
# this is where I'm getting stuck
my @contents = split "\n\n",$contents;
print '['.int(@contents)."]\n";
}
I've (sort of) made it work by doing:
# strip BOM and trailing nulls and carriage returns
s/^..// if $. == 1 and s/\0//g;
s/[\0\r]//g;
But I'm sure there must be a more elegant way to do this. Honestly, I'm
not even sure where to start. Any ideas?
Thanks a bunch,
-dave