perl-unicode

Need: list of Unicode characters that have canonical decompositions.

2011-06-27 09:27:13
A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from unicore/Decomposition.pl or by scanning unicore/UnicodeData.txt. However:

Re unicore/Decomposition.pl, the header of this says:

# !!!!!!!   INTERNAL PERL USE ONLY   !!!!!!!
# This file is for internal use by the Perl program only.  The format and even
# the name or existence of this file are subject to change without notice.
# Don't use it directly.

Re unicore/UnicodeData.txt, I've recently posted a version of my module that uses unicore/UnicodeData.txt to CPAN, and from Perl 5.14 testers I've received only failure notices which indicate that the file cannot be found :-(

Unicode::UCD can tell me if a specific character has a decomposition, but can't give me a list of characters that have decompositions.

Any suggestions would be appreciated.

Bob