xsl-list
[Top] [All Lists]

Re: [xsl] building a hierarchical classification out of flat and redundant data

2006-07-25 01:01:38
Hi David,

The last week an amazin brown arrive me, the problem is the same:
I have this xml:

<modul>
<unit id="1">
<subunit>Rupturas</subunit>
<sub-subunit>sistema </sub-subunit>
<sub-subunit>incertidumbre</sub-subunit>
<subunit>Megatendencias</subunit>
<sub-subunit>Caracterización</sub-subunit>
<sub-sub-subunit>1.2.1.1.</sub-sub-subunit>
<p>Text 1211</p>
<param>Text 2 1211</param>
<sub-sub-subunit>1.2.1.2.</sub-sub-subunit>
<sub-sub-subunit>1.2.1.3.</sub-sub-subunit>
<sub-subunit>Vectores</sub-subunit>
<sub-sub-subunit>1.2.2.1.</sub-sub-subunit>
<sub-sub-subunit>1.2.2.2.</sub-sub-subunit>
<sub-sub-subunit>1.2.2.3.</sub-sub-subunit>
<subunit>Perspectivas</subunit>
<sub-subunit>Ideologías</sub-subunit>
<sub-sub-subunit>1.3.1.1.</sub-sub-subunit>
<sub-sub-subunit>1.3.1.2.</sub-sub-subunit>
<sub-subunit>controversia</sub-subunit>
<sub-sub-subunit>1.3.2.1.</sub-sub-subunit>
<sub-sub-subunit>1.3.2.2.</sub-sub-subunit>
</unit>
<unit id="2">
<p>Desafíos sociolaboral</p>
<subunit>Cantidad</subunit>
<p>Text Cantidad</p>
<sub-subunit>riqueza</sub-subunit>
<sub-subunit>paraíso</sub-subunit>
<sub-subunit>materia</sub-subunit>
<sub-subunit>panorama a las perspectivas</sub-subunit>
<subunit>Calidad</subunit>
<sub-subunit>Polarización</sub-subunit>
<sub-subunit>La cara</sub-subunit>
<sub-subunit>La cruz</sub-subunit>
<sub-subunit>Precarización</sub-subunit>
<subunit>experiencia</subunit>
<sub-subunit>Ejes</sub-subunit>
<sub-subunit>Condiciones</sub-subunit>
<sub-sub-subunit>2.3.2.1.</sub-sub-subunit>
<sub-sub-subunit>2.3.2.2.</sub-sub-subunit>
<sub-sub-subunit>2.3.2.3.</sub-sub-subunit>
<subunit>paradigma</subunit>
<sub-subunit>civilización</sub-subunit>
<sub-subunit>empleísmo</sub-subunit>
<sub-subunit>Agenda</sub-subunit>
</unit>
</modul>

And I have to convert in a hierarchial xml structure into the unit
tag, with this conditions:
- Between tag can exists another tags, this tags belongs to the
preceding-sibling.
- The hierarchi is: unit, subunit,sub-subunit and sub-sub-subunit.

Result file and solution:

<modul>
        <unit id="1">
                <subunit>
                        <title>Rupturas</title>
                        <sub-subunit>
                                <title>sistema </title>
                        </sub-subunit>
                        <sub-subunit>
                                <title>incertidumbre</title>
                        </sub-subunit>
                </subunit>
                <subunit>
                        <title>Megatendencias</title>
                        <sub-subunit>
                                <title>Caracterización</title>
                                <sub-sub-subunit>
                                        <title>1.2.1.1.</title>
                                        <p>Text 1211</p>
                                        <param>Text 2 1211</param>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>1.2.1.2.</title>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>1.2.1.3.</title>
                                </sub-sub-subunit>
                        </sub-subunit>
                        <sub-subunit>
                                <title>Vectores</title>
                                <sub-sub-subunit>
                                        <title>1.2.2.1.</title>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>1.2.2.2.</title>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>1.2.2.3.</title>
                                </sub-sub-subunit>
                        </sub-subunit>
                </subunit>
                <subunit>
                        <title>Perspectivas</title>
                        <sub-subunit>
                                <title>Ideologías</title>
                                <sub-sub-subunit>
                                        <title>1.3.1.1.</title>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>1.3.1.2.</title>
                                </sub-sub-subunit>
                        </sub-subunit>
                        <sub-subunit>
                                <title>controversia</title>
                                <sub-sub-subunit>
                                        <title>1.3.2.1.</title>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>1.3.2.2.</title>
                                </sub-sub-subunit>
                        </sub-subunit>
                </subunit>
        </unit>
        <unit id="2">
                <p>Desafíos sociolaboral</p>
                <subunit>
                        <title>Cantidad</title>
                        <p>Text Cantidad</p>
                        <sub-subunit>
                        <title>riqueza</title>
                        </sub-subunit>
                        <sub-subunit>
                        <title>paraíso</title>
                        </sub-subunit>
                        <sub-subunit>
                        <title>materia</title>
                </sub-subunit>
                <sub-subunit>
                        <title>panorama a las perspectivas</title>
                        </sub-subunit>
                </subunit>
                <subunit>
                        <title>Calidad</title>
                        <sub-subunit>
                                <title>Polarización</title>
                        </sub-subunit>
                        <sub-subunit>
                                <title>La cara</title>
                        </sub-subunit>
                        <sub-subunit>
                                <title>La cruz</title>
                        </sub-subunit>
                        <sub-subunit>
                                <title>Precarización</title>
                        </sub-subunit>
                </subunit>
                <subunit>
                        <title>experiencia</title>
                        <sub-subunit>
                                <title>Ejes</title>
                        </sub-subunit>
                        <sub-subunit>
                                <title>Condiciones</title>
                                <sub-sub-subunit>
                                        <title>2.3.2.1.</title>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>2.3.2.2.</title>
                                </sub-sub-subunit>
                                <sub-sub-subunit>
                                        <title>2.3.2.3.</title>
                                </sub-sub-subunit>
                        </sub-subunit>
                </subunit>
                <subunit>
                        <title>paradigma</title>
                        <sub-subunit>
                                <title>civilización</title>
                        </sub-subunit>
                        <sub-subunit>
                                <title>empleísmo</title>
                        </sub-subunit>
                        <sub-subunit>
                                <title>Agenda</title>
                        </sub-subunit>
                </subunit>
        </unit>
</modul>

This is my solution:

        <xsl:template match="modul">
                <xsl:copy>
                        <xsl:copy-of select="@*"/>
                        <xsl:apply-templates/>
                </xsl:copy>
        </xsl:template>

        <xsl:template match="unit">
                <xsl:copy>
                        <xsl:copy-of select="@*"/>
                        <xsl:call-template name="process-node">
                                <xsl:with-param name="node-father" 
select="name()"/>
                        </xsl:call-template>
                </xsl:copy>
        </xsl:template>
        
        <!-- Copy elements -->
        <xsl:template match="*">
                <xsl:copy>
                        <xsl:copy-of select="@*"/>
                        <xsl:apply-templates/>
                </xsl:copy>
        </xsl:template>
        
        <!--
                Test if an element match with the final block using generate-id 
-->
        <xsl:template name="get-block">
                <xsl:param name="context" select="."/>
                <xsl:param name="target"/>
                                        
                <xsl:if test="generate-id($context)!=$target">
                        <xsl:apply-templates select="$context" mode="copia"/>
                        <xsl:variable name="next-element" 
select="$context/following-sibling::*[1]"/>
                        <xsl:if test="$next-element">
                                <xsl:call-template name="get-block">
                                        <xsl:with-param name="context" 
select="$next-element"/>
                                        <xsl:with-param name="target" 
select="$target"/>
                                </xsl:call-template>
                        </xsl:if>
                </xsl:if>
                
        </xsl:template>
        
        <!--
                Find a subunit tag
        -->
        <xsl:template name="process-node">
                <xsl:param name="context" select="*[1]"/>
                <xsl:param name="node-father"/>
                
                <xsl:choose>
                        <xsl:when test="$context[self::unit or self::subunit or
self::sub-subunit or self::sub-sub-subunit]">
                                <xsl:variable name="node-type" 
select="name($context)"/>
                                <xsl:element name="{$node-type}">
                                        <title><xsl:value-of 
select="$context"/></title>
                                        <xsl:call-template 
name="generate-block">
                                                <xsl:with-param name="context" 
select="$context/following-sibling::*[1]"/>
                                                <xsl:with-param name="node-type" 
select="$node-type"/>
                                        </xsl:call-template>
                                </xsl:element>
                                
                                <xsl:variable name="seguent-node"
select="$context/following-sibling::*[name()=$node-type][1]"/>
                                
                                <xsl:variable name="fathers-name">
                                        <xsl:call-template name="get-pare">
                                                <xsl:with-param name="unitat" 
select="$node-type"/>
                                        </xsl:call-template>
                                </xsl:variable>
                                
                                <!-- Test if are the same type and have the 
same father, for
continuing processing -->
                                <xsl:if test="$seguent-node and 
name($seguent-node)=$node-type and
(generate-id($seguent-node/preceding-sibling::*[name()=$fathers-name][1])=generate-id($context/preceding-sibling::*[name()=$fathers-name][1]))">
                                        <xsl:call-template name="process-node">
                                                <xsl:with-param name="context" 
select="$seguent-node"/>
                                        </xsl:call-template>
                                </xsl:if>
                                
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates select="$context"/>
                                <xsl:if test="$context/following-sibling::*">
                                        <xsl:call-template name="process-node">
                                                <xsl:with-param name="context" 
select="$context/following-sibling::*[1]"/>
                                        </xsl:call-template>
                                </xsl:if>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        
        <xsl:template name="generate-block">
                <xsl:param name="context"/>
                <xsl:param name="node-type"/>
                
                <xsl:if test="$context">
                        <!-- Where stops to process? -->
                        <xsl:variable name="pares">
                                <xsl:call-template name="get-ordre-unitat">
                                        <xsl:with-param name="unitat" 
select="$node-type"/>
                                </xsl:call-template>
                        </xsl:variable>
                        <xsl:variable name="node-limit"
select="contains($pares,concat('*',name($context),'*'))"/>
                        
                        <xsl:if test="not($node-limit)">
                                <xsl:choose>
                                        <xsl:when test="$context[self::unit or 
self::subunit or
self::sub-subunit or self::sub-sub-subunit]">
                                                <xsl:call-template 
name="process-node">
                                                        <xsl:with-param name="context" 
select="$context"/>
                                                </xsl:call-template>            
                  
                                        </xsl:when>
                                        <xsl:otherwise>
                                                <xsl:apply-templates 
select="$context"/>
                                                <xsl:call-template 
name="generate-block">
                                                        <xsl:with-param 
name="context"
select="$context/following-sibling::*[1]"/>
                                                        <xsl:with-param name="node-type" 
select="$node-type"/>
                                                </xsl:call-template>
                                        </xsl:otherwise>
                                </xsl:choose>
                        </xsl:if>
                </xsl:if>
                        
        </xsl:template>
                
        <!-- Sets the hierarchial order -->
        <xsl:template name="get-ordre-unitat">
                <xsl:param name="unitat"/>
                
                <xsl:choose>
                        <xsl:when test="$unitat='unit'">
                                <xsl:value-of select="'*unit*'"/>
                        </xsl:when>
                        <xsl:when test="$unitat='subunit'">
                                <xsl:value-of select="'*unit*subunit*'"/>
                        </xsl:when>
                        <xsl:when test="$unitat='sub-subunit'">
                                <xsl:value-of 
select="'*unit*subunit*sub-subunit*'"/>
                        </xsl:when>
                        <xsl:when test="$unitat='sub-sub-subunit'">
                                <xsl:value-of 
select="'*unit*subunit*sub-subunit*sub-sub-subunit*'"/>
                        </xsl:when>
                </xsl:choose>
                
        </xsl:template>
        
        <!-- Retorna pare -->
        <xsl:template name="get-pare">
                <xsl:param name="unitat"/>
                
                <xsl:choose>
                        <xsl:when test="$unitat='unit'">
                                <xsl:value-of select="''"/>
                        </xsl:when>
                        <xsl:when test="$unitat='subunit'">
                                <xsl:value-of select="'unit'"/>
                        </xsl:when>
                        <xsl:when test="$unitat='sub-subunit'">
                                <xsl:value-of select="'subunit'"/>
                        </xsl:when>
                        <xsl:when test="$unitat='sub-sub-subunit'">
                                <xsl:value-of select="'sub-subunit'"/>
                        </xsl:when>
                </xsl:choose>
                
        </xsl:template>


2006/7/24, Georg Hohmann <georg(_dot_)hohmann(_at_)gmail(_dot_)com>:
Dear XSLT-Community,

i have problem with some "strange" type of data which i have to
convert to a hierarchical xml structure.

My source is a huge xml file which represents a decimal
classifikation. It contains so called documents, where each document
represents one node of the classification. Furthermore each documents
shows the direct parents of a node. It's a structure like this
(example taken from http://www.udcc.org):
...
<document>
       <tag1>3</tag1>
       <tag1a>Social Sciences</tag1a>
</document>
<document>
       <tag1>3</tag1>
       <tag1a>Social Sciences</tag1a>
       <tag2>32</tag2>
       <tag2a>Politics</tag2a>
</document>
<document>
       <tag1>3</tag1>
       <tag1a>Social Sciences</tag1a>
       <tag2>32</tag2>
       <tag2a>Politics</tag2a>
       <tag3>326</tag3>
       <tag3a>Slavery</tag3a>
</document>
...
As you can see there is no hierarchical information in it instead of
the names and the sequence of the tags. In my real data i have up to 9
levels, but not every time. My result should look like this (or
something similar):
...
<node id="3" name="Social Science">
  <node id="32" name="Politics">
     <node id="326" name="Slavery"/>
  </node>
</node>
...
I have simply no idea what to start with to archive this result. I
guess the first step would be to get rid of all those redundant
content, but i don't know how. And i even can't figure out how to
build the hierachichal structure the same time.

Has anyone a good starting point for this?

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--