Re: should \d match *all* the digits? faster with woyka

On Wed, Aug 11, 1999 at 10:26:18AM -0700, Larry Wall wrote:

: On Wed, Aug 11, 1999 at 08:52:46AM -0500, A D wrote:
: > Hello Larry
: > 
: > PLease read this woyka, it can speed perl 100s time
: > and revolutionize the perl engine and the unicode then.
: > Please let me know in due time what you think

Still, for some applications, this would be a reasonable optimization.

Tim Bunce writes:
: Doesn't seem particularly revolutionary. Many people, including myself,
: have already spoken of using UTF8 'characters' to represent arbitary
: encodings and using regular expressions to search and manipulate them.

Yes.

: I do agree that it's a powerful concept that could have wide applications.
: Someone just needs to do the leg work and create a module to make it
: easy to use.

The question is how far you have to go with this.  Since unicode is compatible
with ascii, you can still say

    use utf8;
    print "foo\n";

But a utf8 encoding applies to all the strings in its scope.  What should

    use utf8 'woyka_english';
    print "foo\n";

do?  Encode "foo\n" into woykan, probably, and reverse translate on print.

Or not...


I was simply thinking of having a hash of words to utf8coded-integer
'characters' and an inverted hash of the same, plus some functions like:

        $encoded = words2utf8coded(@words);
        @words   = utf8coded2words($encoded);

That kind of think could be wrapped up into a handy module.

Tim.