I have just released Encode-InCharset-0.01, available as
http://www.dan.co.jp/~dankogai/Encode-InCharset-0.01.tar.gz and CPAN.
I have developed this module primarily to implement ISO-2022-JP-3 and
ISO-2022-CN in future. To implement encode() in these, you have to know
which character set a given character belongs. But this module can also
be used if a string can safely be encoded
(Though fallback is much faster).
Dan the Encode Maintainer
NAME
Encode::InCharset - defines \p{InCharset}
INSTALL
perl Makefile.PL
make test && make install
SYNOPSIS
use Encode::InCharset qw(InJIS0208);
"I am \x{5c0f}\x{98fc}\x{3000}\x{5f3e}" =~ /(\p{InJIS0208})+/o;
# guess what is in $1
ABSTRACT
This module provides In*Charset* Unicode property that matches
characters *Charset*.
As of this writing, Property-matching functions are auto-generated
out
of ucm files in Encode, Encode::HanExtra, and Encode::JIS2K.
DESCRIPTION
As of this writing, this module supports character properties shown
below. Since names are self-explanatory I am not going to discuss in
details.
InASCII InAdobeStandardEncoding InAdobeSymbol InAdobeZdingbat
InBIG5EXT InBIG5PLUS InBIG5_ETEN InBIG5_HKSCS InCCCII InCP1006
InCP1026 InCP1047 InCP1250 InCP1251 InCP1252 InCP1253 InCP1254
InCP1255 InCP1256 InCP1257 InCP1258 InCP37 InCP424 InCP437 InCP500
InCP737 InCP775 InCP850 InCP852 InCP855 InCP856 InCP857 InCP860
InCP861 InCP862 InCP863 InCP864 InCP865 InCP866 InCP869 InCP874
InCP875 InCP932 InCP936 InCP949 InCP950 InDingbats InEUC_CN
InEUC_JISX0213 InEUC_JP InEUC_KR InEUC_TW InGB12345 InGB18030
InGB2312
InGSM0338 InHp_Roman8 InISO_8859_1 InISO_8859_10 InISO_8859_11
InISO_8859_13 InISO_8859_14 InISO_8859_15 InISO_8859_16
InISO_8859_2
InISO_8859_3 InISO_8859_4 InISO_8859_5 InISO_8859_6 InISO_8859_7
InISO_8859_8 InISO_8859_9 InISO_IR_165 InJIS0201 InJIS0208
InJIS0212
InJIS0213_1 InJIS0213_2 InJohab InKOI8_F InKOI8_R InKOI8_U
InKSC5601
InMacArabic InMacCentralEurRoman InMacChineseSimp InMacChineseTrad
InMacCroatian InMacCyrillic InMacDingbats InMacFarsi InMacGreek
InMacHebrew InMacIcelandic InMacJapanese InMacKorean InMacRoman
InMacRomanian InMacRumanian InMacSami InMacSymbol InMacThai
InMacTurkish InMacUkrainian InNextstep InPOSIX_BC InShift_JIS
InShift_JISX0213 InSymbol InVISCII
EXPORT
# will import all of them
use Encode::InCharset;
# will import only properties in qw()
use Encode::InCharset qw(In<Charset>...)
SEE ALSO
the Encode manpage, the perlunicode manpage
AUTHOR
Dan Kogai <dankogai(_at_)home(_dot_)dan(_dot_)intra>
COPYRIGHT AND LICENSE
Copyright 2002 by Dan Kogai
This library is free software; you can redistribute it and/or modify
it
under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html