xsl-list
[Top] [All Lists]

Re: [xsl] Sorting chemical formulae in XSLT 2.0

2010-11-25 03:18:52
Hello,

I think regexp would help. While it's been a while since I have had to
deal with chemical elements, and am therefore not sure I completely
understand your requirements, the following stylesheet gives the
expected result:

<xsl:template match="list">
        <xsl:for-each select="*">
                <xsl:sort select="ms:molSort2(.)"/>
                <xsl:copy-of select="."/>
                </xsl:for-each>
        </xsl:template>

<xsl:function name="ms:molSort2">
        <xsl:param name="node"/>
        <xsl:variable name="filter"><!-- take out unwanted characters and
only keep letters and numbers -->
                <xsl:analyze-string select="string($node)" regex="[A-Za-z0-9]+">
                        <xsl:matching-substring>
                                <xsl:value-of select="."/>
                                </xsl:matching-substring>
                        </xsl:analyze-string>
                </xsl:variable>
        <xsl:variable name="sortString">
                <!-- does two things: pads numbers, and transforms letters to 
their
code, so that at the end
                we only have a long string of numbers -->
                <xsl:analyze-string select="$filter" regex="\d+">
                        <xsl:matching-substring><!-- this is a number -->
                                <xsl:value-of select="format-number(number(.), 
'000')"/>
                                </xsl:matching-substring>
                        <xsl:non-matching-substring><!-- (at this point) this 
is a character -->
                                <xsl:value-of select="string-to-codepoints(.)"/>
                                </xsl:non-matching-substring>
                        </xsl:analyze-string>
                </xsl:variable>
        <xsl:value-of select="$sortString"/>
        </xsl:function>

Hope this helps.
Regards,
EB


On Wed, Nov 24, 2010 at 5:55 PM, Emma Burrows 
<Emma(_dot_)Burrows(_at_)rpharms(_dot_)com> wrote:
Hello,

Using Saxon 9.2 and XSLT 2.0, I am currently sorting a list of chemical 
formulae which appears in the following format:

<list>
  
<item1>(C<sub>19</sub>H<sub>22</sub>N<sub>2</sub>O)<sub>2</sub>,H<sub>2</sub>SO<sub>4</sub>,7H<sub>2</sub>O</item1>
  <item1>C<sub>4</sub>H<sub>7</sub>Cl<sub>3</sub>O<sub>2</sub></item1>
  <item1>CHCl<sub>3</sub></item1>
  <item1>CNa<sub>3</sub>O<sub>5</sub>P </item1>
</list>

The desired sort order is:

CHCl3
CNa3O5P
C4H7Cl3O2
(C19H22N2O)2,H2SO4,7H2O

So the rules are
a. ignore brackets
b. sort letters before numbers
c. sort numbers numerically

Using the following templates, I've managed to get as far as a and b, but I 
need a little help adding c to the mix:

<xsl:template match="list">
  <xsl:for-each select="item1">
    <xsl:sort select="rps:molSort(item1)" case-order="upper-first"/>
    <xsl:copy-of select="item1"/>
  </xsl:for-each>
</xsl:template>

<xsl:function name="rps:molSort" as="xs:string">
   <xsl:param name="node"/>
   <xsl:variable name="step1" select="replace(replace($node, '\(',''), 
'\)','')"/>
   <xsl:variable name="step2" select="replace(replace($step1, '\[',''), 
'\]','')"/>
   <xsl:variable name="step3" 
select="translate($step2,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789','0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')"/>
   <xsl:value-of select="$step3"/>
</xsl:function>

This produces the following output:
CHCl3
CNa3O5P
(C19H22N2O)2,H2SO4,7H2O
C4H7Cl3O2

In other words, numbers are sorted as letters rather than numbers, so the 
subscripts go "1 10 11 2 3.." instead of "1 2 3... 10 11". I need an 
additional criterion somewhere to sort the numbers correctly but I haven't 
found a solution that works yet, so a nudge in the right direction would be 
great.

Thank you!


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>