perl-unicode

Re: Word boundaries

2012-03-26 05:58:20
How can I check what script a character belongs to?

    $ perl -Mutf8 -MUnicode::UCD=charinfo -E'say charinfo(ord
"为")->{script}'
    Han

Sanity checks:

    $ perl -Mutf8 -E'say "为" =~ /\p{Han}/'
    1

    $ uniprops -a1 为 | ack Script
    Script=Han
    Script=Hani

check if it is the same as the
previous one - i.e. back to C mode of programming.

Let the regex engine help you advance the character counter.

    $ cat langs
    ΕλληνικάEnglish한국어日本語Русскийไทย

----

    $ cat langs.pl
    use 5.010;
    use strictures;
    use Unicode::UCD qw(charinfo);

    sub script {
        return charinfo(ord substr($_[0], 0, 1))->{script}
    };

    # necessary because pos() magic is tracked on the scalar.
    my $copy = $_; 
    while (/(\X)/g) {
        my $script = script $1;
        my ($part) = $copy =~ /(\p{$script}+)/;
        say $part;
        pos($_) = pos($_) + length($part);
    }

----

    $ perl -C -ln langs.pl < langs
    Ελληνικά
    English
    한국어
    Русский
    ไทย

Attachment: signature.asc
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>