perl-unicode

[Encode] new CHECK specifications

2002-04-18 17:10:56
On Friday, April 19, 2002, at 05:01 , Nick Ing-Simmons wrote:
I am not sure when the change went in, but current Encode.xs
has broken Tk804.

Ouch.

With $encoding->decode($string,1)

now croaks if character does not map. Croaking is fine as a default
for checking but Tk would like a value of check which does not croak,
but just returns leaving $string starting with the failing character.
I could do a G_EVAL but that is a lot of overhead, and does not tell me
which character position failed (unless $string is updated before
the croak.)

Yikes. I DID fix the behavior as documented. But it was not just Encode::CN::HZ that was taking advantage of UNDOCUMENTED feature after all :).

(Tk does 10,000s of probes - found a character XXXX, have font
with encoding YYYY, can YYYY encode XXXX ?  I hope to reduce that
number by refining the code but it will still do a lot)

With current Encode I don't get to try any interesting fonts
because it croaks when Tk asks iso-8859-1 if it can do the interesting
character :-(

~!(_at_)#$%^&*()_+  (My feeling expressed in octet stream :)

Right now we have:
  check == 0,  fallback char   (New and overdue - thanks!)
  check == -1, perlqq \X{xxxx} style croak

Ah, it does not croak.  It FALLS BACK that way.

  otherwise \N{U+XXXX} style croak

(Did \N{U+XXXX} get (back) in ? - I seem to recall it got removed once.)

Didn't touch that part.

You have established the principle of check values meaning something
(which was always the plan).

Can I suggest though that we make it a bit mask - a stab at an initial
set of bits :
  check == 0 - fallback
  (check & 3) == 1 - croak
  (check & 3) == 2 - warn
  (check & 3) == 3 - silent return
  (check & 4)      - \x{xxxx} vs \N{U+XXXX}
If you like make $string adjustment optional
  check & 8      - Update Don't bother to update $string.

Looks good to me. Maybe I should add constants for that. Maybe I would modify which bits means what, however.

Thus
  check == 0  - fallbacks
  check == 1  - \N{U+XXXX} croak
  check == 2  - \x{XXXX} croak
  check == 3  - silent fail
  chack == 4  - Uninteresting
  check == 5  - \N{U+XXXX} warn
  check == 6  - \x{XXXX} warn
  check == 11 - silent fail with $string updated (What Tk wants)

Better schemes welcome.

What a good timing. I was about to release the next version. I'll take a shower, implement them, possible add test suits for them before the release.

Another alternative hinted at in old pods was passing check as an SV.
Then if SV was a scalar ref, then set $str to point at fail and return
reason code in the scalar.

This one is very attractive but too attractive when code freeze is near. So let's go bit masks for the time being.

PS:

To pick nits - Encode.xs's "layout" looks rather peculiar
with perl source's default tab setting of 8 and expected indent of 4,
and many of files you have touched now have trailing whitespace
on ends of lines.

I've noticed that. Trailing spaces must be due to patches after patches applied (When you paste directly that happens. That has already been fixed in the upcoming version
(I applied "indent-buffer" in Emacs :).

Dan the Encode Maintainer