perl-unicode

Encode, charnames and utf8heavy

2002-05-01 06:58:14
On Wednesday, May 1, 2002, at 10:30 , Jarkko Hietaniemi wrote:
Thanks, upgraded.

A bit of noise from ext/PerlIO/t/fallback.t:

./perl -Ilib ext/PerlIO/t/fallback.t
1..8
ok 1 - opened iso-8859-1 file
"\N{U+20ac}" does not map to iso-8859-1 at ext/PerlIO/t/fallback.t line 21.
ok 2 - perlqq escapes
ok 3 - opened iso-8859-1 file
ok 4 - HTML escapes
ok 5 - Opened as ASCII
# 5c
ok 6 - Escaped non-mapped char
ok 7 - Opened as ASCII
# fffd
ok 8 - Unicode replacement char

Also, is it intentional that there is no \N{U+HHHH} syntax...?
That was planned at some point but as of there is no such thing

Okay,  I'll change the error message in the next one so it would say

"\x{abcd}" does not map to iso-8859-1 at ext/PerlIO/t/fallback.t line 21.

Autrijus just sent me a patch so it won't take long.

./perl -Ilib -Ilib -Mcharnames=:full -e '"\N{U+20ac}"'
Unknown charname 'U+20ac' at lib/unicore/Name.pl line 1

Why not just use \x{HHHH...}?  If that's PERLQQ, that's what
I would expect?

Speaking of charnames and utf8heavy, charname::viacode() is incredibly slow (I tried to use it extensively to pretty-comment ucm files. I gave up and used quicker and dirtier approach originally by NI-XS) and I don't really like how unicore/ is laid out. We can at least make use of AnyDBM_File (the key-value pairs needed there is totally SDBM_File safe so we can safely use it!) or if we can spend more memory, Storable.

return <<'END'
0       FFFF
END

is totally counterintuitive and the whitespace in between must be exactly a single '\t' and that sucks (I've been annoyed why my test script on InMyOwnDefinition didn't work as expected).

I would like to make this a 5.8.1 todo of mine.....

Dan the Encode Maintainer