Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing
2012-08-19 18:21:24
That's what I was about to post (using f:signature as what I called hash):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="f"
version="2.0"
exclude-result-prefixes="f xs"
>
<xsl:output method="xml" indent="yes" />
<xsl:function name="f:signature">
<xsl:param name="map" as="element(map)" />
<xsl:variable name="sorted-map" as="element(singletonMap)*">
<xsl:perform-sort select="$map/*">
<xsl:sort select="@from"/>
<xsl:sort select="@to"/>
</xsl:perform-sort>
</xsl:variable>
<xsl:sequence select="string-join($sorted-map/concat(@from, '~',
@to), '|')"/>
</xsl:function>
<xsl:template match="maps">
<xsl:for-each-group select="map" group-by="f:signature(.)">
<xsl:sequence select="." />
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Input (5 maps):
<?xml version="1.0" encoding="utf-8"?>
<maps>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B707" to="ORD"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B707" to="BER"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B707" to="ORD"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B767" to="ORD"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
</maps>
Output (4 maps):
<?xml version="1.0" encoding="UTF-8"?>
<maps>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B707" to="ORD"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B707" to="BER"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B767" to="ORD"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
</maps>
On 2012-08-20 01:16, Michael Kay wrote:
On 20/08/2012 00:11, Michael Kay wrote:
Try the following:
count(distinct-values(map/f:signature(.))
Sorry, I misread the question as just wanting the number of distinct
maps. If you want the first map in each set of duplicates, use instead
<xsl:for-each-group select="map" group-by="f:signature(.)">
<xsl:sequence select="current-group()[1]"/>
</xsl:for-each-group>
Michael Kay
Saxonica
where f:signature is
<xsl:function name="f:signature">
<xsl:param name="map" as="element(map)">
<xsl:variable name="sorted-map" as="element(singletonMap)*">
<xsl:perform-sort select="$map/*">
<xsl:sort select="@from"/>
<xsl:sort select="@to"/>
</xsl:perform-sort>
</xsl:variable>
<xsl:sequence select="string-join($sorted-map/concat(@from, '~',
@to), '|')"/>
</xsl:function>
Michael Kay
Saxonica
On 19/08/2012 23:43, Costello, Roger L. wrote:
Hi Folks,
I have a sequence of 46,656 elements that I call "maps."
Here is one map:
<map id="Planes-Enroute-to-Airports">
<singletonMap from="F16" to="DFW"/>
<singletonMap from="B707" to="ORD"/>
<singletonMap from="F35" to="MIA"/>
<singletonMap from="S340" to="LAX"/>
<singletonMap from="A320" to="SFO"/>
<singletonMap from="MD90" to="DEN"/>
</map>
I wrote a function to return all of the distinct maps.
Unfortunately it takes about 5 hours of XSLT processing.
Perhaps my XSLT program is inefficient.
I am hoping that you can show me a more efficient program or identify
where my program is inefficient.
I am using XSLT 2.0.
Here is my function to return all distinct maps:
<xsl:function name="ct:distinct" as="element(map)*">
<xsl:param name="maps" />
<xsl:variable name="new-maps">
<maps>
<xsl:sequence select="$maps" />
</maps>
</xsl:variable>
<xsl:for-each select="$new-maps/maps/map">
<xsl:if test="not(ct:contained-within(.,
./following-sibling::map))"><xsl:sequence select="." /></xsl:if>
</xsl:for-each>
</xsl:function>
The following function determines if a map is contained within a
sequence of maps; it uses a binary divide-and-conquer approach:
<xsl:function name="ct:contained-within" as="xs:boolean">
<xsl:param name="map" as="element(map)"/>
<xsl:param name="maps" as="element(map)*"/>
<xsl:variable name="cnt" select="count($maps)" />
<xsl:choose>
<xsl:when test="$cnt eq 0"><xsl:value-of
select="false()" /></xsl:when>
<xsl:when test="ct:equal($map, $maps[1])"><xsl:value-of
select="true()" /></xsl:when>
<xsl:when test="$cnt eq 1"><xsl:value-of
select="false()" /></xsl:when>
<xsl:otherwise>
<xsl:variable name="half" select="$cnt idiv 2" />
<xsl:choose>
<xsl:when test="ct:contained-within($map,
$maps[position() = (2 to $half)])"><xsl:value-of select="true()"
/></xsl:when>
<xsl:otherwise><xsl:value-of
select="ct:contained-within($map, $maps[position() = (($half + 1) to
last())])" /></xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
Two maps are equal iff for each singletonMap in map1 there is a
singletonMap in map2 with the same value for @to and @from:
<xsl:function name="ct:equal" as="xs:boolean">
<xsl:param name="map1" as="element(map)"/>
<xsl:param name="map2" as="element(map)"/>
<xsl:choose>
<xsl:when test="count($map1/*) ne
count($map2/*)"><xsl:value-of select="false()" /></xsl:when>
<xsl:otherwise>
<xsl:variable name="result">
<xsl:for-each select="$map1/singletonMap">
<xsl:variable name="here" select="." />
<xsl:if test="$map2/singletonMap[@from eq
$here/@from and @to ne $here/@to]">false</xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:choose>
<xsl:when test="contains($result,
'false')"><xsl:value-of select="false()" /></xsl:when>
<xsl:otherwise><xsl:value-of select="true()"
/></xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
/Roger
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930
Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
Previous by Date: |
Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing, Michael Kay |
Next by Date: |
Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing, Imsieke, Gerrit, le-tex |
Previous by Thread: |
Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing, Michael Kay |
Next by Thread: |
Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing, Imsieke, Gerrit, le-tex |
Indexes: |
[Date]
[Thread]
[Top]
[All Lists] |
|
|