perl-unicode

Re: Unicode Normalization Forms

2001-08-11 01:37:47

On Sat, 11 Aug 2001 04:42:42 +0200
Bjoern Hoehrmann <derhoermi(_at_)gmx(_dot_)net> wrote:

* SADAHIRO Tomoyuki wrote:
If C<normalize> takes two parameters,
it may be better the 1st is a form name
and the 2nd is a string to be normalized.

cf. printf FOMAT, LIST
   pack TEMPLATE,LIST

They take probably more than two parameters...

   split REGEX, STRING

Hm, yes... though I like

  normalize( $string => 'C' );

better than

  normalize( 'C' => $string );

since my first question would be "normalize? what?"
-> "normalize $string with normalization form 'C'"

The reason why I think the format of C<normalize FORM, STRING>
would be better is that it would make writing one-liners clearer.

Compare

  join " ",
     map sprintf("%04X", $_),
     unpack 'U*',
     normalize 'D',
     pack 'U*',
     map hex(),
     split ' ', shift;

with

  join(" ", map sprintf("%04X", $_), unpack 'U*',
     normalize( pack('U*', map hex(), split ' ', shift(@_)), 'D'));

STRING may be a very, very long expression.

The former avoids the problem of unbalanced parentheses.

I would also suggest to make the form an optional parameter
and default to 'C' since it is the most often required form
e.g. for W3C-normalized text [1] or normalized text in IETF
Protocols [2] (both currently drafts, anyway). To implement
some default form, it would be required to make the form
the second parameter.

Regards, SADAHIRO Tomoyuki

<Prev in Thread] Current Thread [Next in Thread>