perl-unicode

Encode, take three

2000-09-12 11:56:59
=head1 NAME

Encode - character encodings

=head2 TERMINOLOGY

        byte    a number in the range 0..255
        char    a character in the range 0..maxint (at least 2**32-1)

The marker [INTERNAL] marks Internal Implementation Details, in
general meant only for those who think they know what they are doing,
such details may change in future releases.

=head2 bytes

        bytes_to_utf8(STRING)

The bytes in STRING are encoded in-place into UTF-8.  Returns the new
size of STRING, or undef if there's a failure.  [INTERNAL] Also the
UTF-8 flag is turned on.

        utf8_to_bytes(STRING [, STRICT])

The UTF-8 in STRING is decoded in-place into bytes.  Returns the new
size of STRING, or undef if there's a failure, or dies is STRICT is
true and the UTF-8 in STRING is malformed.  [INTERNAL] The UTF-8 flag
of STRING is not checked.

=head2 chars

        chars_to_utf8(STRING)

The chars in STRING are encoded in-place into UTF-8.  The chars are
asssumed to be encodedin ISO 8859-1 (Latin 1) or US-ASCII.  Returns
the new size of STRING, or undef if there's a failure.  [INTERNAL]
Also the UTF-8 flag is turned on.

        utf8_to_chars(STRING)

The UTF-8 in STRING is decoded in-place into chars.  The chars are
asssumed to be in ISO 8859-1 (Latin 1) or US-ASCII.  Returns the new
size of STRING, or undef if there's a failure.  [INTERNAL] The UTF-8
flag of STRING is not checked.

        utf8_to_chars_strict(STRING)

The UTF-8 in STRING is decoded in-place into chars.  Returns the new
size of STRING, or dies if the UTF-8 in STRING is malformed.
[INTERNAL] The UTF-8 flag of STRING is not checked.

=head2 chars With Encoding

        chars_to_utf8(STRING, ENCODING)

The chars in STRING encoded in ENCODING are recoded in-place into
UTF-8.  Returns the new size of STRING, or undef if there's a failure.
[INTERNAL] Also the UTF-8 flag of STRING is turned on.

        utf8_to_chars(STRING, ENCODING [, STRICT])

The UTF-8 in STRING is decoded in-place into chars encoded in
ENCODING.  Returns the new size of STRING, or undef if there's a
failure, or dies if STRICT is true and the UTF-8 in STRING is
malformed.  [INTERNAL] The UTF-8 flag of STRING is not checked.

        from_to(STRING, FROM_ENCODING, TO_ENCODING [, STRICT])

The chars in STRING encoded in FROM_ENCODING are recoded in-place into
TO_ENCODING.  Returns the new size of STRING, or undef if there's a
failure, or dies is STRICT is true and mapping between the encodings
is impossible.

=head2 Testing For UTF-8

        is_utf8(STRING [, STRICT])

[INTERNAL] Test whether the UTF-8 flag is turned on in the STRING.  In
other words, the data in STRING is B<not> checked for being
well-formed UTF-8.  If STRICT is true, also checks the data in STRING
for being well-formed UTF-8.  Returns true if successful, false
otherwise.

=head2 Toggling UTF-8-ness

        on_utf8(STRING)

[INTERNAL] Turn on the UTF-8 flag in STRING.  The data in
STRING is B<not> checked for being well-formed UTF-8.  Do not
use unless you B<know> that the STRING is well-formed UTF-8. 
Returns nothing.

        off_utf8(STRING)

[INTERNAL] Turn off the UTF-8 flag in STRING.  Do not use
frivolously.  Returns nothing.

=head2 UTF-16 and UTF-32 Encodings

        utf_to_utf(STRING, FROM, TO [, STRICT])

The data in STRING is converted from Universal Transfer Encoding FROM
to Universal Transfer Encoding TO.  Both FROM and TO may be any of
the following:

        '7'     UTF-7
        '8'     UTF-8
        '16be'  UTF-16 big-endian
        '16le'  UTF-16 little-endian
        '32be'  UTF-32 big-endian
        '32le'  UTF-32 little-endian

UTF-16 is also known as UCS-2, 16 bit or 2-byte chunks, and UTF-32 as
UCS-4, 32-bit or 4-byte chunks.  Returns the new size of STRING, or
undef is there's a failure, or dies if the STRICT is on and the FROM
is '8' and the UTF-8 in STRING is malformed.  [INTERNAL] Even if
STRICT is true adnd FROM is '8' the UTF-8 flag of STRING is not
checked.  If TO is '8' also the UTF-8 flag of STRING is turned on.

=cut

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>