Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
=head1 NAME
Encode - character encodings
=head2 TERMINOLOGY
byte a number in the range 0..255
char a character in the range 0..maxint (at least 2**32-1)
The marker [INTERNAL] marks Internal Implementation Details, in
general meant only for those who think they know what they are doing,
such details may change in future releases.
=head2 bytes
bytes_to_utf8(STRING)
The bytes in STRING are encoded in-place into UTF-8. Returns the new
size of STRING, or undef if there's a failure. [INTERNAL] Also the
UTF-8 flag is turned on.
Is this a C or a perl API ?
If a perl API then converting to UTF8 means that substr() is going
to give me a sequence of bytes which encode the string. As such they
have to have the internal UTF8 flag turned off.
=head2 chars
chars_to_utf8(STRING)
The chars in STRING are encoded in-place into UTF-8. The chars are
asssumed to be encodedin ISO 8859-1 (Latin 1) or US-ASCII.
You took my name and used it exactly the opposite way to what I intended.
Maybe my name was not as clear as I thought.
My intent was that STRING is _ANY_ string in perl's internal representation.
The returned string is a sequence of bytes (0..255) which are the
encoding of that string.
My names were meant to be used like this:
sysread(Handle,$buffer,...); # buffer seq of bytes
my $str = utf8_to_chars(substr($buffer,$start,$len));
# now we have string of chars and we can use char ops ...
my @words;
foreach (split(/\s/,$str)
{
push(@words,ucfirst(lc($_)));
}
my $newstr = join(' ',@words);
# get back byte stream that protocol needs
my $bytes = chars_to_utf8($newstr);
syswrite(Handle,$bytes);
You could have
my $str = shiftJIS_to_chars(); # or bytes_to_chars($buffer,'shiftJIS')
...
by $bytes = chars_to_shiftJIS(); # or chars_to_bytes($str,'shiftJIS')
--
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.