perl-unicode

Re: C<use utf8> dynamic scope?

1999-06-01 22:48:23
On Tue, 1 Jun 1999 20:49:29 -0400, Chip Salzenberg 
<chip(_at_)perlsupport(_dot_)com> said:

According to Andreas J. Koenig:
On Tue, 1 Jun 1999 15:54:09 -0400, Chip Salzenberg 
<chip(_at_)perlsupport(_dot_)com> said:

On Thu, 27 May 1999 15:55:39 -0400, Chip Salzenberg 
<chip(_at_)perlsupport(_dot_)com> said:
I don't see a use for anything other than UTF-8.  UTF-8 allows the
encoding of huge character codes (up to 40-some bits), so unless you
know of a need for more-than-40-some-bits per character, UTF-8 is
plenty.

FWI, UTF-8 is Latin-centric and not so much liked in Eastern countries
because it is byte-bloat compared to native encodings.

I see, I shoud have said "compared to 16bit encodings as UCS-2 or
UTF-16". Sorry for the confusion.

Well, that depends on whether you assume Unicode or not.  If you don't
assume Unicode, you can C<use locale> and go to town.

I can't quite follow you, Chip. Do you want to say UTF-8 doesn't imply
Unicode?

That's not what I meant, exactly; I meant that you can just skip the
whole Unicode/UTF-8 issue and use locales and eight-bit characters --
i.e. the status quo ante -- if you like.

However:  *Yes*, UTF-8 is independent of Unicode.

Well, this is the headline of RFC 2279:
UTF-8, a transformation format of ISO 10646

But if you look at what UTF-8 really *is*, it's nothing more than a
way of encoding integers larger than eight bits using sequences of
eight-bit bytes with their high bits set.

OK.

Just because the inventors of UTF-8 use it only for Unicode doesn't
mean that we are also so constrained.

Sure. But for now we really want to use utf8 for Unicode and nothing
else.

So what does use locale buy those who would be willing to use UCS-2
but not use UTF-8?

Nothing.  Separate issue.

So there we are. For those who want to have Unicode support but whose
character sets are discriminated by UTF8, an alternative might make
sense. I'm not asking for it, I just wanted to point out that there is
a point in considering other encodings too.

-- 
andreas

<Prev in Thread] Current Thread [Next in Thread>