xsl-list
[Top] [All Lists]

regexs, grouping (?) and XSLT2?

2004-08-07 12:43:28
I've got two problems I can't figure out. I've decided to use XSLT2 to do this, both because it seems more suited, and because I can use the exercise to learn a bit about it.

Problem one, which is pretty easy I suppose, but not for me! I have a bunch of poorly marked-up XHTML documents that I need converted to clean semantic code (in part to then generate citation code from).

I have paragraphs like:

<p>A "quote."</p>

I want the quotes converted to XHTML tags. The following code doesn't work, or did any other variation I tried:

<xsl:template match="xhtml:p">
  <p>
    <xsl:apply-templates mode="quotes"/>
  </p>
</xsl:template>

<xsl:template match="xhtml:p" mode="quotes">
  <xsl:analyze-string select="." regex='"(.*?)"'>
    <xsl:matching-substring>
      <q><xsl:value-of select="regex-group(1)"/></q>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

The second problem is more difficult, and is related to bibliographic and citation formatting.

Among the biggest PITAes I came across trying to work out my own stylesheets was figuring out how to format multiple works by the same author. E.g.,:

citations like (Doe, 1999a, 1999b) or (Smith and Jones, 2001b, 2001d), where it represents two references to the same author-year. BTW, coded like so in new DocBook code:

<citation>
   <biblioref linkend="doe99-1"/>
   <biblioref linkend="doe99-2"/>
</citation>

In reference lists, you (optionally) get this (sequential em-dashes replacing the creator name(s)):

Doe, John (1999a) ...
————. (1999b) ...
————. (1997) ...

With help from someone far more skilled than I, below was the (partial) solution, which seems rather awkward. (It seems I needed to do the same mess of code for both the citations and bib lists, though that may be my own ignorance).

So, the basic logic in author year citation is to say:

look up author names (it gets more complicated if there are more than one, of course) and year
        if same, then append a suffix and remove author from second

For bib lists, same applies, except that the mult-em-dashes only replaces the creator name(s).

So, is there any XSLT2 magic that makes this easier?

<xsl:template match="mods:mods">
<!-- variables -->
  <xsl:variable name="id" select="@ID"/>
  <xsl:variable name="year"
select="substring(descendant::mods:date|descendant::mods: dateIssued,1,4)" />
  <xsl:variable name="first.author"
select="mods:name[(_at_)type='personal' and position()=1]/mods:namePart[(_at_)type='family']|
                                mods:name[(_at_)type='corporate' and 
position()=1]/mods:namePart|
mods:relatedItem[(_at_)type='host']/mods: titleInfo[not(@type='abbreviated') and not(ancestor::mods:mods/mods:name)]/mods:title"/>
  <xsl:variable name="refposition"
select="1+count(preceding-sibling::mods:mods[mods:name[position()=1]/ mods:namePart[(_at_)type='family']=$first.author][substring(.//mods: dateIssued|.//mods:date,1,4)=$year]| preceding-sibling::mods:mods[mods:name[(_at_)type='corporate' and position()=1]/mods:namePart=$first.author][substring(.//mods: dateIssued|.//mods:date,1,4)=$year]| preceding-sibling::mods:mods[mods:relatedItem[(_at_)type='host']/mods: titleInfo[not(@type='abbreviated') and not(ancestor::mods:mods/mods:name)]/mods: title[position()=1]=$first.author][substring(.//mods:dateIssued|.// mods:date,1,4)=$year])"/>
  <xsl:variable name="refposition.following"
select="count(following-sibling::mods:mods[mods:name[position()=1]/ mods:namePart[(_at_)type='family']=$first.author][substring(.//mods: dateIssued|.//mods:date,1,4)=$year]| following-sibling::mods:mods[mods:name[(_at_)type='corporate' and position()=1]/mods:namePart=$first.author][substring(.//mods: dateIssued|.//mods:date,1,4)=$year]| following-sibling::mods:mods[mods:relatedItem[(_at_)type='host']/mods: titleInfo[not(@type='abbreviated') and not(ancestor::mods:mods/mods:name)]/mods: title=$first.author][substring(.//mods:dateIssued|.//mods: date,1,4)=$year])"/>
  <xsl:message>
<xsl:value-of select="concat($first.author,', ',$year,': ',$refposition,' ',$refposition.following)"/>
  </xsl:message>
  <xsl:variable name="suffix">
    <xsl:if test="$refposition+$refposition.following&gt;1">
<xsl:value-of select="substring('abcdefghijklmnopqrstuvwxyz',$refposition,1)"/>
    </xsl:if>
  </xsl:variable>

  <xsl:variable name="editor-number">
        <xsl:value-of select="mods:name/mods:role/mods:roleTerm='editor'"/>
  </xsl:variable>

  <p class="bibentry">
    <xsl:choose>
      <xsl:when test="mods:name">
    <span class="creator">
          <xsl:apply-templates select="mods:name"/>
        <xsl:if test="mods:name/mods:role/mods:roleTerm='editor'">
          <xsl:choose>
            <xsl:when test="count($editor-number)>0">
              <xsl:text> (Eds.) </xsl:text>
            </xsl:when>
            <xsl:otherwise>
              <xsl:text> (Ed.) </xsl:text>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:if>
        </span>
      </xsl:when>
<xsl:when test="mods:relatedItem/descendant::mods:issuance='continuing'">
            <xsl:value-of select="mods:relatedItem/mods:titleInfo/mods:title"/>
      </xsl:when>
    </xsl:choose>
    <xsl:text> (</xsl:text>
    <xsl:value-of select="concat($year,$suffix)"/>
    <xsl:text>) </xsl:text>
<xsl:apply-templates select="mods:titleInfo[not(@type='abbreviated')]"/>
    <xsl:apply-templates select="mods:originInfo"/>
    <xsl:apply-templates select="mods:relatedItem"/>
    <xsl:apply-templates select="mods:genre"/>
    <xsl:apply-templates select="mods:location/mods:physicalLocation"/>
    <xsl:apply-templates select="mods:location/mods:url"/>
    <xsl:text>.</xsl:text>
  </p>
</xsl:template>