-----Original Message-----
From: Michael Kay
This kind of thing is very much easier using XSLT 2.0
* use the unparsed-text() function to read the text file
* split it into individual lines using the tokenize() function
* parse each line using xsl:analyze-string
* arrange it into a hierarchical structure using xsl:for-each-group
Incomplete structure, and I couldn't get saxon to escape the hyphen
in a character class, but it may be of help.
input file
H-A-HEADER some content
I-AN-ITEM-1 more content
I-AN-ITEM-2 and again
S-A-SUMMARY-1 for variety
I-AN-ITEM-3 and change
S-A-SUMMARY-2 and different again
Stylesheet
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:template match="/">
<xsl:variable name="f"
select="unparsed-text('unparsedEntity.txt','utf-8')"/>
<someRoot>
<xsl:for-each select='tokenize($f, "\n")'>
<record>
<xsl:analyze-string regex="[a-zA-Z0-9]+" select=".">
<xsl:matching-substring>
<word><xsl:value-of select="."/></word>
</xsl:matching-substring>
<xsl:non-matching-substring>
<other>
<xsl:value-of select="."/>
</other>
</xsl:non-matching-substring>
</xsl:analyze-string>
</record>
</xsl:for-each>
</someRoot>
</xsl:template>
</xsl:stylesheet>
regex="[\-a-zA-Z0-9]+"
failed to select any matches?
http://www.w3.org/TR/xmlschema-2/#regexs
seems to make it valid?
HTH DaveP
** snip here **
--
DISCLAIMER:
NOTICE: The information contained in this email and any attachments is
confidential and may be privileged. If you are not the intended
recipient you should not use, disclose, distribute or copy any of the
content of it or of any attachment; you are requested to notify the
sender immediately of your receipt of the email and then to delete it
and any attachments from your system.
RNIB endeavours to ensure that emails and any attachments generated by
its staff are free from viruses or other contaminants. However, it
cannot accept any responsibility for any such which are transmitted.
We therefore recommend you scan all attachments.
Please note that the statements and views expressed in this email and
any attachments are those of the author and do not necessarily represent
those of RNIB.
RNIB Registered Charity Number: 226227
Website: http://www.rnib.org.uk