Re: How to handle unicode strings in utf8 and pre-utf8 pragma perls

David Graff wrote:

If I understand Nicholas Clark's suggestion, it would mean that for any
perl version prior to 5.8.0, the script won't compile unless "if.pm"
has been installed from CPAN.

The fact that "if.pm" exists and is usable on older perl5 versions is
really good news, but it still might be a hurdle for some users who
depend on remote web-server sys-admins (or other uncontrollable forces)
for perl support...


But as the modules I'm writing are not core perl modules, they'd have to be
installed anyway - I guess that's a problem whatever way I do it.

I've got to say if.pm looks like a brilliantly simple way of handling my
problem.

In any case, one work-around for handling utf8 text in a version-neutral
way would be to store this text in a file, not hard-coded into the perl
script; then decide how to read the file, depending on the version; e.g.

 open( DAYS, "day_names.utf8" );
 binmode( DAYS, ":utf8" ) if ( $] >= 5.008 );
 @day_names = <DAYS>;
 close DAYS;

Depending on what you do with the data elsewhere in your script, I'm not
sure whether 5.6 will treat the data as utf8 characters when read from
a file like this (5.6 does not support "binmode ':utf8', FH"), but
there's a good chance that it will work.

You can also attach this text content at the end of your script, in a
__DATA__ segment, and set DATA as the file handle in the code sample
shown above (rather than DAYS).

Of course even using __DATA__, it can get tedious and hard to maintain
if you have a lot of little string constants scattered throughout.


Thanks - these are useful ideas which I'll use in some other modules I'm
doing, but if.pm just feels right for what I'm trying to do ATM.

(P.S.: for some reason, three of the characters in your first string
didn't map to proper Cyrillic code points for me: \u04e9 and the two
occurrences of \u04af -- I don't know the language, but were those
typos?)


Ah, I picked the example at random - I'm using data from the OpenI18N/ICU
locales, and looking at the Kirghiz locale using the IBM ICU
LocaleExplorer:

  http://oss.software.ibm.com/cgi-bin/icu/lx/en/utf-8/?_=ky

I see the same result - it also says:

"Note: You're viewing an experimental locale. This locale is not part of the
official ICU installation! Please do not file bugs against this locale"

At the top, so who knows!

I hate having to use languages that I don't understand and, based off
feedback so far, there are problems with the ICU data as it stands. 

But I suppose a "comprehensive" set of locale date modules consisting of
English and basic French wouldn't be quite so useful ;->

Thanks for the feedback,
-- 
Richard Evans
scriptyrich(_at_)yahoo(_dot_)co(_dot_)uk