perl-unicode

CGI and UTF

2002-11-20 13:30:14
I'm having some problems with XML/UTF8 and CGI variables in perl5.6.1
 
I have attached an example of the problem, an example string is Descripción - 
although you will need to have XML::Simple installed. 
 
The example takes an input string and then prints it twice - one with 
concatenation another just displaying the inputted string. The mangling occurs 
when you concatenate an XML string with a CGI string.
 
I'm not sure why this happens but here is a first attempt at a possible theory. 
All XML parsing is done in UTF8, but perl has no idea of encodings for 
incomding CGI streams and assumes them to be iso-88591 (latin1) - I read this 
somewhere don't know if its correct. String operations upgrade none UTF8 
strings to UTF8, so perl tries to convert the CGI string from iso-88591 to UTF8 
thus mangling it as its already UTF8.
 
Can any point me in the right direction, explain where I'm going wrong and 
maybe provide some usefull links - there seems to be very little information on 
building internationalised web pages with UTF8 and perl5.6.1.
 
Thanks
 
Mark
 

Attachment: testUTF8.pl
Description: testUTF8.pl

<Prev in Thread] Current Thread [Next in Thread>