Hello,
I think regexp would help. While it's been a while since I have had to
deal with chemical elements, and am therefore not sure I completely
understand your requirements, the following stylesheet gives the
expected result:
<xsl:template match="list">
<xsl:for-each select="*">
<xsl:sort select="ms:molSort2(.)"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
<xsl:function name="ms:molSort2">
<xsl:param name="node"/>
<xsl:variable name="filter"><!-- take out unwanted characters and
only keep letters and numbers -->
<xsl:analyze-string select="string($node)" regex="[A-Za-z0-9]+">
<xsl:matching-substring>
<xsl:value-of select="."/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:variable name="sortString">
<!-- does two things: pads numbers, and transforms letters to
their
code, so that at the end
we only have a long string of numbers -->
<xsl:analyze-string select="$filter" regex="\d+">
<xsl:matching-substring><!-- this is a number -->
<xsl:value-of select="format-number(number(.),
'000')"/>
</xsl:matching-substring>
<xsl:non-matching-substring><!-- (at this point) this
is a character -->
<xsl:value-of select="string-to-codepoints(.)"/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:value-of select="$sortString"/>
</xsl:function>
Hope this helps.
Regards,
EB
On Wed, Nov 24, 2010 at 5:55 PM, Emma Burrows
<Emma(_dot_)Burrows(_at_)rpharms(_dot_)com> wrote:
Hello,
Using Saxon 9.2 and XSLT 2.0, I am currently sorting a list of chemical
formulae which appears in the following format:
<list>
<item1>(C<sub>19</sub>H<sub>22</sub>N<sub>2</sub>O)<sub>2</sub>,H<sub>2</sub>SO<sub>4</sub>,7H<sub>2</sub>O</item1>
<item1>C<sub>4</sub>H<sub>7</sub>Cl<sub>3</sub>O<sub>2</sub></item1>
<item1>CHCl<sub>3</sub></item1>
<item1>CNa<sub>3</sub>O<sub>5</sub>P </item1>
</list>
The desired sort order is:
CHCl3
CNa3O5P
C4H7Cl3O2
(C19H22N2O)2,H2SO4,7H2O
So the rules are
a. ignore brackets
b. sort letters before numbers
c. sort numbers numerically
Using the following templates, I've managed to get as far as a and b, but I
need a little help adding c to the mix:
<xsl:template match="list">
<xsl:for-each select="item1">
<xsl:sort select="rps:molSort(item1)" case-order="upper-first"/>
<xsl:copy-of select="item1"/>
</xsl:for-each>
</xsl:template>
<xsl:function name="rps:molSort" as="xs:string">
<xsl:param name="node"/>
<xsl:variable name="step1" select="replace(replace($node, '\(',''),
'\)','')"/>
<xsl:variable name="step2" select="replace(replace($step1, '\[',''),
'\]','')"/>
<xsl:variable name="step3"
select="translate($step2,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789','0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')"/>
<xsl:value-of select="$step3"/>
</xsl:function>
This produces the following output:
CHCl3
CNa3O5P
(C19H22N2O)2,H2SO4,7H2O
C4H7Cl3O2
In other words, numbers are sorted as letters rather than numbers, so the
subscripts go "1 10 11 2 3.." instead of "1 2 3... 10 11". I need an
additional criterion somewhere to sort the numbers correctly but I haven't
found a solution that works yet, so a nudge in the right direction would be
great.
Thank you!
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--