On Tue, 11 Dec 2001 21:40:36 +0000, awiar(_at_)hotmail(_dot_)com (Jalal
my $mydoc = shift ;
# check BOM
my $top1 = unpack("C", substr($mydoc, 0, 1));
my $top2 = unpack("C", substr($mydoc, 1, 1));
my $top3 = unpack("C", substr($mydoc, 2, 1));
# UTF-8
if($top1 eq 239 && $top2 eq 187 && $top3 eq 191) {
$mydoc = substr($mydoc, 3, length($mydoc) - 3);
return $mydoc;
Another way to do it might be
my $mydoc = shift;
my $bom = substr($mydoc, 0, 3);
# Check for UTF-8 BOM
if($bom eq "\xef\xbb\xbf") {
substr($mydoc, 0, 3) = '';
return $mydoc;
That way, you can compare all three bytes at once (your method looks
more like C :)... except that you used 'eq' for a numeric comparison,
which just looks like 'wrong'.). And I believe that by assigning to
substr, you may save a copy of the entire string, since Perl may simply
remember that the real data starts three bytes past the first allocated
character (using OOK, if you're into the internals).