Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
bytes_to_utf8($string, $encoding)
utf8_to_bytes($string, $encoding)
Scratch these. Bytes are in no encoding. They are numbers.
Yeah - but it is only a matter of time before we want
to take a bunch of Shift-JIS bytes and turn them into perl chars.
We will need
nativebytes_to_chars($string,$encoding);
I still think this indicates API is too "implementation centric"
I am worried about perl-code having all these representations
spelt out. To _me_ the whole point of the UNICODE approach is
that we can do anything by
whatever-to-UNICODE, massage, UNICODE-to-wanted.
I think bytes_to_utf8 is a worrying opposite of that - that says we
start with some "binary" bytes that perl cannot use char ops on,
and converts it to another sequence of bytes which again are
not perl "chars". If substr() et. al. are going to pull out UTF8
encoded bytes (as LDAP needs) then perl cannot say /[:alpha:]/.
--
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.