perl-unicode

Advance warning of tweaks to Encode API.

2002-02-01 04:40:37
Nick Ing-Simmons <nick(_at_)ing-simmons(_dot_)net> writes:
  You can use t/table.euc under Jcode module for instance.  table.utf8
in my code example is just a utf8 version thereof. That's a data which
contains all characters defined in EUC (well, actually JISX0212 is not
included but very few environments can display JISX0212).

It is realy great to have some valid data!

For a start it has found a bug in :encoding layer - knew there must be some...
(I think I have rediscovered the multi-byte char spanning buffer boundary
bug ... which I could not reproduce before)

That is it - :encoding needs some serious re-work for any encoding
which will winge about partial characters (8-bit never does, and 16-bit
is unlikely to with even-length buffers - but multi-bytes can.
But since layers are much more stable now it can be recoded in a
better manner anyway.

To do that it needs to know why encode/decode stopped - did they "fail"
or just "pause" ? So  ->decode and ->encode methods are going to get tweaked
as hinted at in the existing pod.

I am currently leaning towards allowing "check" to be a reference
something like :

$uni = $enc->decode($octets);        # best attempt + replacement chars
$uni = $enc->decode($octets,0);      # croak on error ?
$uni = $enc->decode($octets,1);      # stop on error
$uni = $enc->decode($octets,\$err);  # stop on error reason code in $err
$uni = $enc->decode($octets,\&foo);  # Call foo on error - protocol TBD

I need to think through a sane set of "numeric" check options perhaps
a "mask" of which errors are croak/replace/stop/ignored ?

I think you can deduce something from return value as well,
e.g. returns +ve length but does not consume whole string
     then that is result so far. TO find out why
     call it again - undef means no representation
                   - defined but zero length means partial char
                   - +ve length meant we had run out of room
                     (does not occur at perl level as SV can grow...)




--
Nick Ing-Simmons
http://www.ni-s.u-net.com/


<Prev in Thread] Current Thread [Next in Thread>
  • Advance warning of tweaks to Encode API., Nick Ing-Simmons <=