perl-unicode

Re: Mixing Unicode and Byte output on a Unicode enabled Perl 5.8.0

2003-10-10 01:30:05
One Small Doubt

The only area of doubt I have about this problem being caused by the base Perl 
and it configuration results from having the MIME::Lite and MIME::Base_64 
modules available. Both of these I would expect to have access to the encode 
features but neither are used in this code module. They are used in other 
modules elsewhere on the CGI but no connection to the troublesome module.

The Pound Sterling

The pound is defiantly odd; from memory the PC originally allowed the £ to 
replaced the # and you could have one or the other. Then codepage 850 changed 
things so you could have both and the pound moved to 0xA3 in the range beyond 
the ASCII defined characters. A somewhat checkered history.

Now in my Red Hat environment "LANG=en_GB.UTF-8" is set and I think this is 
causing Perl to render the £ in a two byte format 0xC2A3 however in the source 
the one byte 0xA3 is used and understood. So the input/source is not encoded 
but the output is encoded; I don't really understand, why?

Equally so far the £ seems to be the only character effected in this way.

However, now I have the no encoding; pragma in force everything is rendered as 
one byte characters.

I love Perl but I am not sure that this part is very transparent. I would have 
expected the norm to follow the input/source and only do translation on 
instruction. Equally as the use byte; pragma is supposed to force characters to 
be rendered as "almost binary" I expected it to stop the two byte rendering.

I think this area of 5.8 whilst better than 5.6 may still need some 
clarification before the average user can understand it easily.

Frank


John Delacour <JD(_at_)BD8(_dot_)COM> 10/10/03 00:25:07 >>>
At 4:05 pm +0100 9/10/03, Frank Smith wrote:

 I have now forced Perl to prodcue uncoded output by the use of:

 no coding;

 which has worked wonders.

no encoding, I presume you mean.  That makes no difference here.


On the other hand if I run this

use encoding "utf8", STDOUT => "MacRoman" ;
print "\x{2022}" ;

I get the one-byte Mac bullet instead of the 
three-byte utf8 character I would get with just

print "\x{2022}" ;

There seems to be something odd about the "£". 
Perl on my machine prints it in one byte whatever 
I do.  Maybe something to do with locale settings.

JD



***********************************************************************
This transmission contains information which may be confidential and  
which may also be privileged.  It is intended for the named addressee  
only.  Unless you are the named addressee, or authorised to receive it 
on behalf of the addressee you may not copy or use it, or disclose it 
to anyone else.  If you have received this transmission in error please 
contact the sender.  Thank you for your cooperation. 
***********************************************************************

For more information about AEA Technology please visit our website at 
http://www.aeat.co.uk

AEA Technology plc registered office 329 Harwell, Didcot, Oxfordshire OX11 0QJ.
Registered in England and Wales, number 3095862.