On Thursday, Sep 19, 2002, at 11:39 Asia/Tokyo, Autrijus Tang wrote:
Hi there. Recently I need to do some hacking based on the
EastAsianWidth
property (cf. http://www.unicode.org/unicode/reports/tr11/) of unicode
characters. Naturally, I tried the regular expression \p{} and \P{}
syntax,
with no avail.
Come to think of EastAsianWidth, I needed that property when I wrote
unidump (under Encode/bin, not installed by default). It looks like as
follows;
# Generated out of lib/unicore/EastAsianWidth.txt
# will it work ?
#
our $IsFullWidth =
qr/^[
\x{1100}-\x{1159}
\x{115F}-\x{115F}
\x{2329}-\x{232A}
\x{2E80}-\x{2E99}
\x{2E9B}-\x{2EF3}
\x{2F00}-\x{2FD5}
\x{2FF0}-\x{2FFB}
\x{3000}-\x{303E}
\x{3041}-\x{3096}
\x{3099}-\x{30FF}
\x{3105}-\x{312C}
\x{3131}-\x{318E}
\x{3190}-\x{31B7}
\x{31F0}-\x{321C}
\x{3220}-\x{3243}
\x{3251}-\x{327B}
\x{327F}-\x{32CB}
\x{32D0}-\x{32FE}
\x{3300}-\x{3376}
\x{337B}-\x{33DD}
\x{3400}-\x{4DB5}
\x{4E00}-\x{9FA5}
\x{33E0}-\x{33FE}
\x{A000}-\x{A48C}
\x{AC00}-\x{D7A3}
\x{A490}-\x{A4C6}
\x{F900}-\x{FA2D}
\x{FA30}-\x{FA6A}
\x{FE30}-\x{FE46}
\x{FE49}-\x{FE52}
\x{FE54}-\x{FE66}
\x{FE68}-\x{FE6B}
\x{FF01}-\x{FF60}
\x{FFE0}-\x{FFE6}
\x{20000}-\x{2A6D6}
]$/xo;
Naturally, I can hack up a local patch to unicore/{Canonical,Exact}.pl
and parse the yet-unused unicore/EastAsianWidth.txt to add the desired
properties in, namely (better names welcome):
\p{En} \p{EastAsianNeutral}
\p{Ea} \p{EastAsianAmbiguous}
\p{Eh} \p{EastAsianHalfwidth}
\p{Ew} \p{EastAsianWide}
\p{Ef} \p{EastAsianFullwidth}
\p{Ena} \p{EastAsianNarrow}
But as it overrides core modules's behaviours, I'd hesitate to release
it
as a CPAN module (Unicode::EastAsianWidth), but rather suggest it to
be included in core perl.
Are there any hidden drawbacks or other problems with this idea?
Full/Half width was not supposed to be a part of character encoding
ideally but we all know we need that in practice, especially when you
need to render those chars nice and tidy in fixed-width fonts (that's
why I came up w/ a quick and dirty hack above -- it's a unicode-savvy
hexdump). So I second the idea of adding East Asian Width properties
SOMEHOW.
I said somehow because I am not so sure if it requires tweaking the
core. I think we can reached the goal in a same manner as my humble
Encode::InCharset, a module I declined to add to Encode.
Dan the Man with Too Many Character Properties to Remember, Too Few to
Feel Practical