Turns out that all that was unnecessary, and the program would have saved
and restored the UTF fine if I just hadn't tried to blindly untaint the data
with...
sub untaint_blind {
$_[0] =~ /^(.*)$/;
my $ret = $1;
$ret;
}
This is perl5.6.1
-----Original Message-----
From: Jarkko Hietaniemi [mailto:jhi(_at_)iki(_dot_)fi]
Sent: Friday, January 10, 2003 1:39 PM
To: Merijn van den Kroonenberg
Cc: Narins, Josh; perl-unicode(_at_)perl(_dot_)org
Subject: Re: beginniner's 5.6.1 latin1<->utf8 question
On Fri, Jan 10, 2003 at 07:28:00PM +0100, Merijn van den Kroonenberg wrote:
You might be looking for these:
# ISO 8859-1 to UTF-8
s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
# UTF-8 to ISO 8859-1
s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
I think that will work (they are not mine, so don't blame me if not
;-)
They are mine :-) so I feel free to say that they don't &#NNN; conversion...
but they certainly could be changed to work so.
Greetings, Merijn
----- Original Message -----
From: "Narins, Josh" <josh(_dot_)narins(_at_)lehman(_dot_)com>
To: <perl-unicode(_at_)perl(_dot_)org>
Sent: Friday, January 10, 2003 6:54 PM
Subject: beginniner's 5.6.1 latin1<->utf8 question
At one point I had a regex which perfectly converts the string A
below
into
a series of ê strings.
This is nice for me, because I just sling them out on the web, and
as entities, they always seem to work.
I've lost the regex, can't seem to find it. I know it had chr or ord
in
it.
I've been reading the perl-unicode archives, and googling, but I
just
don't
see it.
This is for perl5.6.1 with Sun's (reputedly?) sick iconv.
If someone could tap me in the right direction...
Thx in advance
--------------------------------------------------------------------
------
----
This message is intended only for the personal and confidential use
of the
designated recipient(s) named above. If you are not the intended
recipient of this message you are hereby notified that any review,
dissemination, distribution or copying of this message is strictly
prohibited. This communication is for information purposes only and
should not be regarded as an offer to sell or as a solicitation of an
offer to buy any financial product, an official confirmation of any
transaction, or as an official statement of Lehman Brothers. Email
transmission cannot be guaranteed to be secure or error-free.
Therefore, we do not represent that this information is complete or
accurate and it should not be relied upon as such. All information is
subject to change without notice.
--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this
special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
------------------------------------------------------------------------------
This message is intended only for the personal and confidential use of the
designated recipient(s) named above. If you are not the intended recipient of
this message you are hereby notified that any review, dissemination,
distribution or copying of this message is strictly prohibited. This
communication is for information purposes only and should not be regarded as an
offer to sell or as a solicitation of an offer to buy any financial product, an
official confirmation of any transaction, or as an official statement of Lehman
Brothers. Email transmission cannot be guaranteed to be secure or error-free.
Therefore, we do not represent that this information is complete or accurate
and it should not be relied upon as such. All information is subject to change
without notice.