perl-unicode

Re: Encode, take two

2000-09-13 06:39:34
On Wed, Sep 13, 2000 at 09:21:21AM +0100, Nick Ing-Simmons wrote:
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
   bytes_to_utf8($string, $encoding)
   utf8_to_bytes($string, $encoding)

Scratch these.  Bytes are in no encoding.  They are numbers.

Yeah - but it is only a matter of time before we want

to take a bunch of Shift-JIS bytes and turn them into perl chars.

Hmmm...

We will need  
        nativebytes_to_chars($string,$encoding);


I still think this indicates API is too "implementation centric" 
I am worried about perl-code having all these representations
spelt out. To _me_ the whole point of the UNICODE approach is 
that we can do anything by 

        whatever-to-UNICODE, massage, UNICODE-to-wanted.

I think bytes_to_utf8 is a worrying opposite of that - that says we 
start with some "binary" bytes that perl cannot use char ops on,

Assume I have a string a bunch of bytes that makes sense in Shift-JIS,
as Shift-JIS characters.  Now, how I am going to get it to Unicode?
chars_to_blah() won't help since they are not yet in Unicode chars.
So yes, I think you are right, we need the bytes_to_utf8(), and
bytes_to_chars() is then a natural convenience wrapper.

and converts it to another sequence of bytes which again are 
not perl "chars". If substr() et. al. are going to pull out UTF8
encoded bytes (as LDAP needs) then perl cannot say /[:alpha:]/.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>