Re: Starnge characters when displaying html files saved in UTF-8 format

On Tue, 11 Dec 2001 21:40:36 +0000, awiar(_at_)hotmail(_dot_)com (Jalal 
Kakavand)
wrote:

my $mydoc = shift ;
      # check BOM
      my $top1 = unpack("C", substr($mydoc, 0, 1));
      my $top2 = unpack("C", substr($mydoc, 1, 1));
      my $top3 = unpack("C", substr($mydoc, 2, 1));

      # UTF-8
      if($top1 eq 239 && $top2 eq 187 && $top3 eq 191) {
              $mydoc = substr($mydoc, 3, length($mydoc) - 3);
      }

      return $mydoc;
}


Another way to do it might be

    my $mydoc = shift;
    my $bom = substr($mydoc, 0, 3);
    # Check for UTF-8 BOM
    if($bom eq "\xef\xbb\xbf") {
        substr($mydoc, 0, 3) = '';
    }
    return $mydoc;

That way, you can compare all three bytes at once (your method looks
more like C :)... except that you used 'eq' for a numeric comparison,
which just looks like 'wrong'.). And I believe that by assigning to
substr, you may save a copy of the entire string, since Perl may simply
remember that the real data starts three bytes past the first allocated
character (using OOK, if you're into the internals).

Cheers,
Philip

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Unicode / Transliteration, Jean-Michel Hiver

Next by Date:

Re: Starnge characters when displaying html files saved in UTF-8 format, Philip Newton

Previous by Thread:

Re: Starnge characters when displaying html files saved in UTF-8 format, Jalal Kakavand

Next by Thread:

Translating a Latin-1 string to a UTF8 string in Perl 5.6.1, Michael A. Grady

Indexes:

[Date] [Thread] [Top] [All Lists]