Unicode::Collate 0.23 Released


Hi, all.

Unicode::Collate 0.23 is released.


Changes between 0.21 -> 0.23 are:

0.23  Wed Sep 04 19:25:20 2002
    - fix: scalar match() no longer returns an lvalue substr ref.
    - fix: "Ignorable after variable" should be made level 3 ignorable
           even if alternate => 'blanked'.
    - Now a grapheme may contain trailing level 2, level 3,
      and completely ignorable characters.

0.22  Mon Sep 02 23:15:14 2002
    - New File: index.t.
      (The new test.t excludes tests for index.)
    - tweak on index(). POSITION is supported.
    - add match, gmatch, subst, gsubst methods.
    - fix: ignorable after variable in 'shift'-variable weight.

The match, gmatch, subst, gsubst methods work
like m//, m//g, s///, s///g, respectively,
but they are not aware of any pattern, but only a literal substring.

e.g.

  use Unicode::Collate;
  my $Collator = Unicode::Collate->new(
     normalization => undef, level => 1,
  );
     # (normalization => undef) is REQUIRED.

  my $str = "Camel ass came\x{301}l CAMEL horse cAm\0E\0L...";

  $Collator->gsubst($str, "camel", sub { "<b>$_[0]</b>" });
    # cf. $str =~ s/(camel)/"<b>$1</b>"/egi;

Then all the camels are made bold-faced;
i.e. $str is converted to

   "<b>Camel</b> ass <b>came\x{301}l</b> <b>CA" .
       "MEL</b> horse <b>cAm\0E\0L</b>..."

NB. Almost control characters (TAB, LF, VT, FF, CR, and NEL are
    exceptions) are completely ignorable, then arbitary insertion
    of such characters does not affect with any matching under UCA
       (cf. UTS #10, 4.1, S1.3).
    E.g., "cAm\0E\0L", "ca\cA\cA\cAmel", etc. are
    equivalent to "camel" properly.

(P.S.) A bit of tweak for EBCDIC machine is added,
       but I'm not sure it would works properly.
       Any test or help welcome!

Regards,
SADAHIRO Tomoyuki