xsl-list
[Top] [All Lists]

RE: [xsl] generating ID strings that are both readable and unique

2008-10-14 05:44:29
Quite hard to do in "pure" XSLT 1.0 without a node-set() extension, because
I think any solution that is moderately efficient is going to involve some
temporary data.

I would create a temporary document containing all distinct ids/titles like
this

<xsl:variable name="allids">
  <xsl:for-each-group select="//section" group-by="(@original-id,
@title)[1]">
      <id id="{current-grouping-key()}" count="count(current-group())"/>
  </xsl:for-each-group>
</xsl:variable>

Here's a function to get a unique ID derived from a string s and a sequence
number, that is guaranteed unique:

<xsl:function name="f:unique" as="xs:string">
  <xsl:param name="input" as="xs:string"/>
  <xsl:param name="gid" as="xs:string"/>
  <xsl:choose>
    <xsl:when test="exists($allids/id[(_at_)id=$input]">
      <xsl:sequence select="f:unique(concat($input, '_', $gid))"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="$input"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

And then when processing an individual section, 

<xsl:attribute name="id">
  <xsl:choose>
    <xsl:when test="@id">
      <xsl:value-of select="@id"/>
    </xsl:when>
    <xsl:when test="$allids/id[(_at_)id=current()/@title]/@count eq 1">
      <xsl:value-of select="@title"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="f:unique(@title, generate-id())"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:attribute>

Instead of using generate-id() for disambiguation, you could use the result
of xsl:number level="any". This would mean that if there are two sections
titled "Introduction", one gets the id "Introduction_1", the other
"Introduction_2". In the rare event that "Introduction_1" is already in use,
you would get "Introduction_1_1" etc.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Trevor Nicholls [mailto:trevor(_at_)castingthevoid(_dot_)com] 
Sent: 14 October 2008 07:26
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] generating ID strings that are both readable and unique

Hi

In this particular application we have a set of XML documents 
which are divided into nested sections; each section may 
(down the track) give rise to a url. Currently that url is 
generated by <xsl:number level="multiple"> but this produces 
urls that change frequently. Some sections have been given an 
ID attribute by the process which originally created the 
documents, but most have not.
Additionally, all sections must have exactly one title child, 
along with their other content.

The requirement is to process an XML file and generate an ID 
attribute for sections which lack them - deriving the ID 
value from the title so that the url is comprehensible. 
Providing we ignore the problem cases, this is a trivial exercise:

----

 <xsl:variable name="upchars" 
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />  <xsl:variable 
name="lochars" select="'abcdefghijklmnopqrstuvwxyz'" />

 <!-- catchall -->
 <xsl:template match="*">
   <xsl:copy>
     <xsl:apply-templates select="@*" />
     <xsl:apply-templates />
   </xsl:copy>
 </xsl:template>

 <xsl:template match="@*">
   <xsl:copy-of select="." />
 </xsl:template>

 <xsl:template match="section[(_at_)id]">
   <xsl:copy>
     <xsl:apply-templates select="@*" />
     <xsl:apply-templates />
   </xsl:copy>
 </xsl:template>

 <xsl:template match="section">
   <xsl:copy>
     <xsl:attribute name="id">
       <xsl:apply-templates select="title" mode="id" />
     </xsl:attribute>
     <xsl:apply-templates select="@*" />
     <xsl:apply-templates />
   </xsl:copy>
 </xsl:template>

 <xsl:template match="title" mode="id">
   <xsl:value-of select="translate(translate(.,' 
','_'),$upchars,$lochars)"
/>
 </xsl:template>

----

The problem cases are
(a) duplicate titles (after the translations) which would 
lead to duplicate IDs, and
(b) existing IDs which might also duplicate a title.

If there were no IDs in the document to begin with, I think I 
could have solved the first problem by using a key. But the 
second problem complicates it, and I haven't got enough 
experience with keys to figure out how to adjust the "id" 
mode title template to take both issues into account.

Can anyone offer some helpful advice here?
XSL 1.0 is preferred, although I would be interested to see 
how XSL2 might handle this problem too. 

Thanks
Trevor



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--