perl-unicode

Re: Encode, take three

2000-09-13 01:39:46
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
=head1 NAME

Encode - character encodings

=head2 TERMINOLOGY

      byte    a number in the range 0..255
      char    a character in the range 0..maxint (at least 2**32-1)

The marker [INTERNAL] marks Internal Implementation Details, in
general meant only for those who think they know what they are doing,
such details may change in future releases.

=head2 bytes

      bytes_to_utf8(STRING)

The bytes in STRING are encoded in-place into UTF-8.  Returns the new
size of STRING, or undef if there's a failure.  [INTERNAL] Also the
UTF-8 flag is turned on.

Is this a C or a perl API ?

If a perl API then converting to UTF8 means that substr() is going 
to give me a sequence of bytes which encode the string. As such they
have to have the internal UTF8 flag turned off.


=head2 chars

      chars_to_utf8(STRING)

The chars in STRING are encoded in-place into UTF-8.  The chars are
asssumed to be encodedin ISO 8859-1 (Latin 1) or US-ASCII.  

You took my name and used it exactly the opposite way to what I intended.
Maybe my name was not as clear as I thought.

My intent was that STRING is _ANY_ string in perl's internal representation.
The returned string is a sequence of bytes (0..255) which are the 
encoding of that string.

My names were meant to be used like this:

   sysread(Handle,$buffer,...);   # buffer seq of bytes 
   my $str = utf8_to_chars(substr($buffer,$start,$len));
   # now we have string of chars and we can use char ops ...
   my @words;
   foreach (split(/\s/,$str)
    {
     push(@words,ucfirst(lc($_)));
    }
   my $newstr = join(' ',@words);
   # get back byte stream that protocol needs
   my $bytes  = chars_to_utf8($newstr);
   syswrite(Handle,$bytes);  

You could have
   my $str = shiftJIS_to_chars();   # or bytes_to_chars($buffer,'shiftJIS')     
 
...
   by $bytes = chars_to_shiftJIS(); # or chars_to_bytes($str,'shiftJIS')


-- 
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.

<Prev in Thread] Current Thread [Next in Thread>