Re: Unicode Normalization Forms

* SADAHIRO Tomoyuki wrote:

To implement C<normalize> with two parameters, 
it need to be determined how to catch exception
when an invalid form name is passed in.

1) croak
2) carp and return false
3) only return false
4) use default (what is default?)
5) another... 

I think it should be croaked like "Invalid type in pack"


Yes, a 'croak "Unknown form '$form' in normalize()"' would be
fine.

If C<normalize> takes two parameters,
it may be better the 1st is a form name
and the 2nd is a string to be normalized.

cf. printf FOMAT, LIST
   pack TEMPLATE,LIST


They take probably more than two parameters...

   split REGEX, STRING


Hm, yes... though I like

  normalize( $string => 'C' );

better than

  normalize( 'C' => $string );

since my first question would be "normalize? what?"
-> "normalize $string with normalization form 'C'"

I would also suggest to make the form an optional parameter
and default to 'C' since it is the most often required form
e.g. for W3C-normalized text [1] or normalized text in IETF
Protocols [2] (both currently drafts, anyway). To implement
some default form, it would be required to make the form
the second parameter.

What's wrong with Unicode::Normalize?


As yet I don't not know
whether the Unicode:: category is available at present
and neither what another name is appropriate.  :-(


Maybe Jarkko may give us a hint?

[1] http://www.w3.org/TR/charmod/#sec-TextNormalization
[2] urn:ietf:i-d:draft-duerst-i18n-norm-04.txt
-- 
Björn Höhrmann { mailto:bjoern(_at_)hoehrmann(_dot_)de } 
http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/