I wrote a transformation that uses unparsed-text() and regex processing to
create an XML version of the Unicode database; once you've got that, you can
easily look up what code block a particular character falls into because
it's part of the data for each character. (Well, most of the characters.
Some of the non-BMP entries share a single entry for a large group of
characters, which needs a bit of care).
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: David Sewell [mailto:dsewell(_at_)virginia(_dot_)edu]
Sent: 29 May 2008 20:45
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] XSLT script to report Unicode characters and
code blocks in file?
I'm working on a simple XSLT 2.0 script to list all distinct
Unicode characters used in a file. That part of the script
takes very few lines, thanks to distinct-values(),
codepoints-to-string(), and string-to-codepoints().
However, I'd also like to group the output by code block:
http://www.fileformat.info/info/unicode/block/index.htm
Best way I can see to do it is to write a local function that
tests the codepoint value and uses lots and lots of
<xsl:when> case tests to determine which block the character
falls into. Not hard but a bit tedious. Has anyone invented
this wheel already?
DS
--
David Sewell, Editorial and Technical Manager ROTUNDA, The
University of Virginia Press PO Box 801079, Charlottesville,
VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell(_at_)virginia(_dot_)edu Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--