I need to process a set of documents organized into directories where for a
given parent directory there may be any number of subdirectories representing
multiple versions of the same logical artifact, where the directory name
reflects the versions, e.g.:
/A/B/C/en/1.0/foo.xml
/A/B/C/en/1.2/foo.xml
/A/B/C/fr/1.0/foo.xml
/A/B/C/fr/1.2/foo.xml
/A/B/C/fr/1.3/foo.xml
/A/B/D/en/1.0/foo.xml
/A/B/D/en/1.2/foo.xml
/A/B/D/en/1.3/foo.xml
/A/B/D/en/1.4/foo.xml
/A/B/D/fr/1.0/foo.xml
/A/B/D/fr/1.2/foo.xml
/A/B/D/fr/1.3/foo.xml
I need to process only those foo.xml files that are the latest version under a
given common ancestor (i.e., the latest version for each language, where the
/A/B/C path represents a single course in this case).
I'm doing this entirely within XSLT 3 (rather than using e.g., a bash shell to
determine the set of files to process), mostly because I'm tasked with
inserting an XSLT transform into an existing system where adding anything other
than an XSLT is problematic.
But I think this also serves as a useful exercise in general XSLT/XPath map
manipulation, at least as I've initially gone about trying to solve this
problem.
Given the list of URLs for all of these foo.xml files I want to reduce it to
just /A/B/C/en/1.3/foo.xml, /A/B/C/fr/1.2/foo.xml, /A/B/D/en/1.4/foo.xml, and
/A/B/D/fr/1.3/foo.xml
That is, for each locale in each course, get the latest version.
In addition, I want to group the files by the 3rd directory ("C", or "D"),
which serves as a "course ID.
Maps seem like an obvious way to do this:
1. Use Saxon's collection() function with the metadata=yes option to get a set
of maps, one for each file, that includes the full path to the file (this
avoids loading a bunch of files I don't actually want and gives me maps as a
starting point).
2. Using these maps, add the version, locale, and 3rd-level directory name as
separate entries in each map, creating a more complete set of "descriptor" maps
that make it easy to access to relevant fields I care about.
3. Create a new map where the keys are 3rd directory name ("course ID") and the
values are the descriptor maps a given course id/locale pair with the highest
version.
My question: How best to implement step 3?
Step 2 is simple data processing: pull apart each URL and create the maps.
Step 3 is less obvious because you have to compare entries based on both the
course ID and version values.
My initial solution for step 3 is to use xsl:iterate to construct a result map:
<xsl:variable name="courses-by-id" as="map(xs:string, map(*)*)">
<xsl:iterate select="$configs-to-use">
<xsl:param name="result-map" as="map(xs:string, map(*))"
select="map{}"/>
<xsl:on-completion>
<xsl:sequence select="$result-map"/>
</xsl:on-completion>
<xsl:variable name="this-version" as="xs:double"
select="xs:double(.?version)"/>
<xsl:variable name="previous-course-entry" as="map(*)?"
select="map:get($result-map, .?course-id)"
/>
<xsl:variable name="test-version" as="xs:double"
select="
if (exists($previous-course-entry))
then xs:double($previous-course-entry?version)
else 0.0
"
/>
<xsl:next-iteration>
<xsl:with-param name="result-map" as="map(xs:string, map(*))"
select="
if ($this-version gt $test-version)
then map:put($result-map, .?course-id, .)
else $result-map"
/>
</xsl:next-iteration>
</xsl:iterate>
This works (or at least appears to in my initial small tests) but it feels like
there ought to be a less verbose way to do this same kind of operation.
What is the better way to do this kind of "find the map entries that meet a
specific requirement relative to other members of the map" processing?
Thanks,
Eliot
--
Eliot Kimber
http://contrext.com
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--