Re: C<use utf8> dynamic scope?

Chaim Frenkel writes:
: My current impression of the utf8 support
: is that utf8 == Unicode. (Look at the \p{} support thrown in by
: use utf8.)

Not really.  None of the character property data is hardwired in, though
the code assumes that certain property classes exist.

: This should be split out. 
: 
:       use Unicode;    # Get the unicode attributes
:       use utf8;       # Character encoding format
: 
: Then perhaps the far eastern folks can have their cake and eat it.

You can already say

    use utf8 'Big5';

for that sort of thing--it just defaults to Unicode.  Two caveats:

    * Nobody's written the tables for anything but Unicode yet.
    * You can't mix Unicode with non-Unicode tables currently.

The only construct that's really Unicode-centric is \X, which matches
"clump", that is, a base character followed by its associated diacritics.
Even that is based on external definitions of what a "mark" is.  The
only Unicode assumption in it is that a base character precedes its
diacritics.

But the basic point remains that any encoding that puts commonly used
characters at code points above 2048 is going to require three bytes to
represent those common characters.  I've said from the beginning that
we might end up with some sort of 'use utf16" for the default in Eastern
countries, but that I wasn't terribly interested in writing it myself.

If you want to get a big headache, think about

    use utf16 'Big5';

You'll note that utf16 is not so very 'u'.

Larry

Previous by Date:	Re: C<use utf8> dynamic scope?, Chaim Frenkel
Next by Date:	Re: C<use utf8> dynamic scope?, Larry Wall
Previous by Thread:	Re: C<use utf8> dynamic scope?, Chaim Frenkel
Next by Thread:	Re: C<use utf8> dynamic scope?, Chaim Frenkel
Indexes:	[Date] [Thread] [Top] [All Lists]