perl-i18n

Re: Perl i18n/l10n limitations?

2003-04-03 13:07:50
Hi there, thanks for the reply.

Autrijus Tang wrote:
On Tue, Apr 01, 2003 at 09:35:17PM +0000, Rich wrote:
Built in support seems limited to charsets, collation, and basic number
formatting (printf etc). To query localization information, you have to
rely on POSIX functionality which may or may not be present, and the
information returned varies from platform to platform (what on earth do
you do on Win32?)

This is not true.  Locale::Maketext uses Win32::Locale and HTTP header
parsing where necessary. 

OK, but I was discussing built in support, which is supplied via the locale 
pragma. Is Locale::Maketext now the official perl localization API? (that's 
a genuine question BTW, I'm not being sarcastic :)

The future of perl appears to be utf8, but perlunicode states: "Use of
locales with Unicode is discouraged" - this removes perl's built in
support at a stroke.

No, it says that locales.pm is to be discouraged, not using its
information for interesting l10n applications.

Yeah, that's what I meant - you shouldn't use the locale pragma, and you 
therefore lose its functionality.

I highly encourage you get aquainted with the Locale::Maketext
framework, and its built-in support for transforming things into
locale-specific equivalents via language subclasses.

Well I have used Locale::Maketext, but I'm not convinced that it's the 
answer to everything...yet - it makes message translation easy, but there's 
little support for anything else, numf method excepted - please correct me 
if I'm wrong here BTW.

IMHO it makes sense to have a standardized, consistent way to query 
l10n information. In an ideal world, CPAN modules needing i18n would then 
use this information for the basis of their l10n. Even better, it might be 
possible to take the detailed locale information already provided on IBM's 
ICU site to quickly build up a library of common locales - ICU license 
depending of course.

So, if the Locale::Maketext handle were to provide such l10n information, 
that's fine, but it has to be provided somewhere. Information such as:

  language
  country
  variant
  date_datetime_format
  date_short_datetime_format
  date_date_format
  date_short_date_format
  date_time_format
  date_short_time_format
  date_calendar
  date_days
  date_abbreviated_days
  date_months
  date_abbreviated_months
  money_international_currency_symbol
  money_international_fractional_digits
  money_currency_symbol
  money_fractional_digits
  money_thousands_separator
  money_decimal_point
  money_pos_cs_precedes
  money_neg_cs_precedes
  money_pos_separate_space
  money_neg_separate_space
  money_pos_sign_position
  money_neg_sign_position
  money_pos_sign
  money_neg_sign
  money_grouping
  number_decimal_point
  number_thousands_separator
  number_grouping
  telecom_international_dialing_prefix

  ...and so on.

And if the idea is for Locale::Maketext to provide the required 
functionality (or simply a standardized API), the API needs to be expanded:

  moneyf (currencyf ?), international_moneyf, short_datef, 
  datef, short_timef, timef, short_datetimef, datetimef,
  phonef, postalf ... etc

But in the case of something like datef methods, what arguments do you take? 
DateTime ref?, Date::Calc::Object ref?, individual fields? Is this even the 
correct thing to do, since date objects may need locale information at 
instantiation to determine calendar, 1st day of week etc.

There are also situations where people wish to see eg. messages in one 
language but numbers formatted in another. As it stands, a Locale::Maketext 
L10N object provides both the lexicon and formatting methods and is 
therefore tied to a single language. Bracket notation only emphasizes this:

  $lh->maketext("Total count: [numf,_1]", $total_count);

What do I do if a user wants the message in English with French number 
formatting? I obtained my handle with $lh = 
MyProgram::L10N->get_handle("en-GB"), and bracket notation has translated 
my numf call to $lh->numf($_[1]).

Well, I suppose the easiest thing is:

  my $lh = MyProgram::L10N->get_handle("en-GB");
  my $fh = MyProgram::L10N->get_handle("fr-FR");

  $lh->maketext("Total count: [_1]", $fh->numf($total_count));

But am I right in thinking that I've just pulled in the French lexicon 
simply to get numeric formatting?

A more flexible system like POSIX LC_ based formatting allows users to 
control formating independently of the chosen lexicon language. In that 
case you might take a more modular approach:

Locale/ Handle           - provides l10n attributes for use by any modules
                           needing i18n.
        Maketext         - message translation.
        NumberFormat     - number formatting.
        CurrencyFormat   - currency...
        DateTimeFormat   - date time...
        PostalFormat     - zip/post codes...
        TelecomFormat    - phone numbering...

Maybe my code now looks like:

  Locale::Maketext->set_default("en-GB");

  my $lh = MyProgram::L10N->get_handle
          (
            Lexicon        => "default",
            CurrencyFormat => "default",
            NumberFormat   => "fr-FR",
          );

   $lh->maketext("Total count: [numf,_1]", $total_count);  # French format

   $lh->set_numf_lang("en-GB");

   $lh->maketext("Total count: [numf,_1]", $total_count);  # UK format

I admit this is a forced example, and that my comments are probably too 
simplistic... maybe wrong, but I'm fairly sure these are real issues.

The alternative to the above is that any module needing i18n provides 
constructors and/or l10n methods that take a locale handle and use that 
information to handle l10n internally, which is kind of the way Java works 
IIRC.

In that case, you might eventually end up with stuff like:

  my $h  = Locale::Handle->default;
  my $lh = MyProgram::L10N->get_handle($h);

  my $date     =       DateTime->new(locale => $h);
  my $currency = Math::Currency->new(locale => $h);

  ...

  print $lh->maketext("At [_1] your available funds were: [_2]",
                           $date->short_date,
                       $currency->to_string);

I suppose the bottom line is this - I'd love to see CPAN developers using a 
single, well developed framework for i18n instead of rolling their own, as 
the few modules I've looked at seem to. Locale::Maketext could well form 
that framework, but perhaps it needs more work to fulfill that role?

Does any of this seem reasonable? I'm happy to take some suggestions and 
hack some code to see if any of this would work in practice.

Thanks,
--
Rich
scriptyrich(_at_)yahoo(_dot_)co(_dot_)uk

<Prev in Thread] Current Thread [Next in Thread>