On Tue, 11 Dec 2001 21:40:36 +0000, awiar(_at_)hotmail(_dot_)com (Jalal
Kakavand)
wrote:
my $mydoc = shift ;
# check BOM
my $top1 = unpack("C", substr($mydoc, 0, 1));
my $top2 = unpack("C", substr($mydoc, 1, 1));
my $top3 = unpack("C", substr($mydoc, 2, 1));
# UTF-8
if($top1 eq 239 && $top2 eq 187 && $top3 eq 191) {
$mydoc = substr($mydoc, 3, length($mydoc) - 3);
}
return $mydoc;
}
Another way to do it might be
my $mydoc = shift;
my $bom = substr($mydoc, 0, 3);
# Check for UTF-8 BOM
if($bom eq "\xef\xbb\xbf") {
substr($mydoc, 0, 3) = '';
}
return $mydoc;
That way, you can compare all three bytes at once (your method looks
more like C :)... except that you used 'eq' for a numeric comparison,
which just looks like 'wrong'.). And I believe that by assigning to
substr, you may save a copy of the entire string, since Perl may simply
remember that the real data starts three bytes past the first allocated
character (using OOK, if you're into the internals).
Cheers,
Philip