perl-unicode

Re: Encode test problems in EBCDIC

2002-02-22 04:41:22
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:
  Dan, in case EBCDIC scares you (and it should :-), a quick intro:
  basically, consider the whole low 256 characters being rearranged from
  what they are in ASCII.  For example, ord("A") is 0xC1, not 0x41. (The
  pod/perlebcdic.pod has the full tables.)

  Sure it does scare me.  I have to confess UTF-EBCDIC was totally out
of mind.  But here I got a hint;  Like what perl used to be, CJK
encodings are very, very ASCII-chauvinistic;  Its variable-length
encoding heavily relies on the fact that ascii leaves MSB of the byte
alone.  That way you can tell if a given byte is a whole (half-width)
character or half of full-width character.

That is fine. When in the CJK codings they can stay ASCII_oid.

The problem comes when we convert to perl's internal form.
An ASCII 'A' in shift-JIS or whatever will still become 0xC1 in
an EBCDIC perl because that is "defined" to be EBCDIC perl's
view of U+0041.

So if tests convert CJK into "internal" and then just do ord()
they will fail for range 0..255. There are some XS functions
to map native<->unicode numbers.

  The shadow of ASCII casts even on ISO-2022, an escape-based encoding
that is not supposed to be affected by MSB and such (Only \e was
supposed to matter);  in ISO-2022, most 2-byte characters are
represented by either 96x96 or 94x94 grid, which is (7bit ascii -
control characters) or (that - space (0x20) and DEL (\x7F)).
  Obviously this will not work on EBCDIC....

Nor should it.

  This one may be tougher than we think....
  FYI I know something called 12-bit EBCDIC kanji also exists.  I know
only of existence but is that in our support list?

If OS390 (or ICU given its history) has tables we can probably support
them.


The test logs are attached: I would really appreciate if you could see
some pattern in the failures.

  I will do the best I can but I will be away for this weekend and I
won't be back online till Sunday at least.

--
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Dan the Unstable according to Jack Cohen
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/



<Prev in Thread] Current Thread [Next in Thread>