Re: Malformed UTF-8 character

At 1:12 am +0200 26/10/03, Marco Baroni wrote:

I am new to (explicit) unicode handling, and right now I am facingthis problem.
I have some data (lots of data) that in theory should be in ascii(with entity references in place of non-ascii characters). I haveno easy way to get to know exactly how these data were generated.

Presumably you have some idea what OS the files were created on. Ifthey are MacRoman files, us-ascii or not, then you might trysomething like this. The first part of the script simply creates asample file for testing purposes:



#!/usr/bin/perl -w
# write some MacRoman to file some.txt
my $text = "/tmp/some.txt" ;
open TEXT, ">$text" ;
print TEXT 'œ∑é®†¥üîøπ' ;
#####  `open -a 'SimpleText' $text` ; # if you like
close TEXT;
#
#
use encoding "MacRoman", STDOUT => "utf8";

my $top = q(<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Some chars</title>);

my $html = "/tmp/some.html";
open HTML, ">:encoding(utf8)", "$html";
print HTML $top; # write contents of some.txt to html file as utf-8
open TEXT, "<:encoding(MacRoman)", "$text" ;
for (<TEXT>) {
        s~∑~S~g ;
        print HTML;
}
`open -a 'Safari' $html` ;

<Prev in Thread]	Current Thread	[Next in Thread>
Malformed UTF-8 character, Marco Baroni PS (Malformed UTF-8 character), Marco Baroni Re: PS (Malformed UTF-8 character), David Graff Re: PS (Malformed UTF-8 character), Marco Baroni Re: PS (Malformed UTF-8 character), Edward Cherlin Re: PS (Malformed UTF-8 character), David Graff Re: PS (Malformed UTF-8 character), Benjamin Franz Re: Malformed UTF-8 character, John Delacour <=

Previous by Date:	Re: PS (Malformed UTF-8 character), Marco Baroni
Next by Date:	Re: Bidirectional (bidi) Support?, Ken Beesley
Previous by Thread:	Re: PS (Malformed UTF-8 character), Benjamin Franz
Next by Thread:	possible patch for Perl 5.8.2's Alias.pm, Robert Allerstorfer
Indexes:	[Date] [Thread] [Top] [All Lists]