perl-unicode

Re: Encode UTF-8 optimizations

2016-08-25 02:48:53
On Wednesday 24 August 2016 22:49:21 Karl Williamson wrote:
On 08/22/2016 02:47 PM, pali(_at_)cpan(_dot_)org wrote:

snip

I added some tests for overlong sequences. Only for ASCII platforms, tests 
for EBCDIC
are missing (sorry, I do not have access to any EBCDIC platform for testing).

It's fine to skip those tests on EBCDIC.

Ok.

Anyway, how it behave on EBCDIC platforms? And maybe another question
what should  Encode::encode('UTF-8', $str) do on EBCDIC? Encode $str to
UTF-8 or to UTF-EBCDIC?

It works fine on EBCDIC platforms.  There are other bugs in Encode on
EBCDIC that I plan on investigating as time permits.  Doing this has
fixed some of these for free.  The uvuni() functions should in almost
all instances be uvchr(), and my patch does that.
Now I'm thinking if FBCHAR_UTF8 define is working also on EBCDIC... I think 
that it
should be different for UTF-EBCDIC.

I'll fix that

On EBCDIC platforms, UTF-8 is defined to be UTF-EBCDIC (or vice versa if
you prefer), so $str will effectively be in the version of UTF-EBCDIC
valid for the platform it is running on (there are differences depending
on the platform's underlying code page).
So it means that on EBCDIC platforms you cannot process file which is 
encoded in UTF-8?
As Encode::decode("UTF-8", $str) expect $str to be in UTF-EBCDIC and not in 
UTF-8 (as I
understood).

Yes.  The two worlds do not meet.  If you are on an EBCDIC platform, the
native encoding is UTF-EBCDIC tailored to the code page the platform runs
on.

In searching, I did not find anything that converts between the two, so I
wrote a Perl script to do so.  Our OS/390 man, Yaroslav, wrote one in C.

Thank you for information! I though that "UTF-8" encoding (with hyphen)
is that strict and correct UTF-8 version on both ASCII & EBCDIC
platforms as in Encode documentation is nothing written that on EBCDIC
is is different...

Anyway, if you need some help with Encode module or something different,
let me know. As I want to have UTF-8 support in Encode correctly
working...