perl-unicode

Actually doing the work changing XS modules

2002-01-29 14:39:08
Hi,

I'm one of the people who actually like the way perl 5.6 and up is
moving wrt utf8.
Most data coming into my programs is already in UTF-8 nowadays: XML or
database data, where we also use it inside the database. XML::Parser
already sets the right bit, but DBD::Oracle doesn't yet.

I've hacked something in so that it checks the idiotic NLS_LANG Oracle
environment variable for UTF8 and if it finds it it does an SvUTF8_on()
on string data coming in from Oracle, which looks like it does the right
thing.

Another module which I thought could benefit from some hacking is
Text::Unaccent, which uses iconv in its original form. It turned out to
be very easy to make it into a utf8-only module which is probably a lot
faster and smaller too, with no dependencies on external libraries.

In both cases I still need to clean the modules up and see whether their
authors like what I did to them :-)

When I want to get my data *out* of perl, utf-8 is fine unless it has to
be HTML for old browsers. For that it's easy to use a regexp to search
for [^\x{00}-\x{7f}] and replace it with either an entity like ë or
a numeric entityref like Ά
Note that I don't even *care* whether the UTF8 bit is still on after
that transform, since I know it's all 7-bit ASCII anyway.

It would be nice if the above transform were easy using Encode, and if
it were, it would be nice if Encode would work with 5.6.1, but I'll
manage regardless.

-- 
Bart.

<Prev in Thread] Current Thread [Next in Thread>
  • Actually doing the work changing XS modules, Bart Schuller <=