xsl-list
[Top] [All Lists]

RE: [xsl] XSLT script to report Unicode characters and code blocks in file?

2008-05-29 13:33:27
I wrote a transformation that uses unparsed-text() and regex processing to
create an XML version of the Unicode database; once you've got that, you can
easily look up what code block a particular character falls into because
it's part of the data for each character. (Well, most of the characters.
Some of the non-BMP entries share a single entry for a large group of
characters, which needs a bit of care).

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: David Sewell [mailto:dsewell(_at_)virginia(_dot_)edu] 
Sent: 29 May 2008 20:45
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] XSLT script to report Unicode characters and 
code blocks in file?

I'm working on a simple XSLT 2.0 script to list all distinct 
Unicode characters used in a file. That part of the script 
takes very few lines, thanks to distinct-values(), 
codepoints-to-string(), and string-to-codepoints().

However, I'd also like to group the output by code block:

http://www.fileformat.info/info/unicode/block/index.htm

Best way I can see to do it is to write a local function that 
tests the codepoint value and uses lots and lots of 
<xsl:when> case tests to determine which block the character 
falls into. Not hard but a bit tedious. Has anyone invented 
this wheel already?

DS

--
David Sewell, Editorial and Technical Manager ROTUNDA, The 
University of Virginia Press PO Box 801079, Charlottesville, 
VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell(_at_)virginia(_dot_)edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--