perl-unicode

Re: Starnge characters when displaying html files saved in UTF-8 format

2001-12-11 14:41:10
Yes the problem was related with BOM.

Used this function to remove the BOM chars of a UTF-8 file:


sub parse($) {

my $mydoc = shift ;
        # check BOM
        my $top1 = unpack("C", substr($mydoc, 0, 1));
        my $top2 = unpack("C", substr($mydoc, 1, 1));
        my $top3 = unpack("C", substr($mydoc, 2, 1));

        # UTF-8
        if($top1 eq 239 && $top2 eq 187 && $top3 eq 191) {
                $mydoc = substr($mydoc, 3, length($mydoc) - 3);
        }

        return $mydoc;
}

and the idea was from the following code and its parse function:

http://dev.w3.org/cvsweb/p3p-validator/20010928/xml.pl?annotate=1.3&sortby=file

Thanks all for help

-Jalal





From: Brian Stell <bstell(_at_)ix(_dot_)netcom(_dot_)com>
To: perl-unicode(_at_)perl(_dot_)org
Subject: Re: Starnge characters when displaying html files saved in UTF-8 format
Date: Tue, 11 Dec 2001 11:34:09 -0800

Jalal,

Kindly reply via the mailing list so others can see the discussion.
That way others can benefit and/or help.

BOM is the Byte Order Mark used in Unicode to indicate an
important detail about the Unicode data stream.

Perhaps the Perl people can describe how to inhibit the BOM?

Jalal Kakavand wrote:
>
> Hi there,
>
> I don't now what is BOM?!
> With both IE6 and Netscape 4.7 I 've same issue and this is my final page
> with that issue:
>
> http://www.khaterat.com/
>
> If you see there is an extra blank newline at the first line and at the
> start of other snip files.
> BTW the OS is Linux/Unix and I'm using notpad to save my html files in UTF-8
> format and also i dont use any soecial perl modules of unicode.
>
> Thanks,
>
> jalal
>
> >From: Markus Kuhn <Markus(_dot_)Kuhn(_at_)cl(_dot_)cam(_dot_)ac(_dot_)uk>
> >To: "Jalal Kakavand" <awiar(_at_)hotmail(_dot_)com>
> >CC: perl-unicode(_at_)perl(_dot_)org
> >Subject: Re: Starnge characters when displaying html files saved in UTF-8
> >format
> >Date: Tue, 11 Dec 2001 14:36:02 +0000
> >
> >"Jalal Kakavand" wrote on 2001-12-10 23:45 UTC:
> > > I use Windows Notepad for typing and saving my html snip files and then
> >save
> > > them in UTF-8 format.Then in my perl program after reading thoes snip
> >files
> > > and printing to the browser there is a strange character at the start of > > > each snip!! how can I remove thoes extra chars? its a kind of new line
> > > character.
> >
> >Is it the BOM?
> >
> >http://www.cl.cam.ac.uk/~mgk25/unicode.html
> >
> >Markus
> >
> >--
> >Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> >Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
> >
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

--
Brian Stell
mailto:bstell(_at_)ix(_dot_)netcom(_dot_)com




_________________________________________________________________
MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx