xsl-list
[Top] [All Lists]

Re: [xsl] Dealing mixed content with invalid node-like text

2011-12-04 14:31:31
On Sun, Dec 04, 2011 at 03:00:36PM -0500, Syd Bauman scripsit:
[parsing a string containing an imbalanced XML fragment into nodes]
In which case, someone who knows more about such things will need to
answer, as I don't think I know how to convert a string to a sequence
of nodes or a result tree fragment. I'm not really sure why one would
want to do such a thing, 

Sometimes you get mixed content that needs to be wrapped on delimiters
in the string -- think of a comma-separated list of links with
associated ancillary text, where you want to have output that replaces
the comma delimiters with a wrapper element but keep the link elements
in the output.  The best way I know of to do this is to serialize the
whole chunk of input, tokenize on the delimiter pattern, and convert the
results back into nodes.

In XSLT 2.0, you can do the node reconstitution using a recursive
function:

<xsl:function as="node()*" name="d:parseFragmentString">
  <xsl:param as="xs:string" name="instring"/>
  <xsl:choose>
    <xsl:when test="not(normalize-space($instring))">
      <!-- stop; we're out of string -->
    </xsl:when>
    <xsl:when test="matches($instring,'^&lt;\p{L}')">
      <!-- we start with an element tag; figure out what it is, create it, and 
call again on the element
                  contents and everything after the element -->
      <xsl:variable name="eName">
        <xsl:choose>
          <xsl:when test="matches($instring,'^&lt;\w+&gt;')">
            <!-- no attributes -->
            <xsl:sequence
              select="replace(substring-before($instring,'&gt;'),'^&lt;','')"/>
          </xsl:when>
          <xsl:when test="matches($instring,'^&lt;\w+/&gt;')">
            <!-- no attributes, empty element -->
            <xsl:sequence
              select="replace(substring-before($instring,'/&gt;'),'^&lt;','')"/>
          </xsl:when>
          <xsl:otherwise>
            <!-- attributes -->
            <xsl:sequence select="replace(substring-before($instring,' 
'),'^&lt;','')"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:variable name="attribString">
        <xsl:choose>
          <xsl:when test="matches($instring,'^&lt;\w+&gt;')">
            <xsl:sequence select="()"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:sequence
              select="substring-after(substring-before($instring,'&gt;'),' ')"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:variable name="closeTag" select="concat('&lt;/',$eName,'&gt;')"/>
      <!-- construct the element, its attributes if any, and call again on its 
contents -->
      <xsl:element name="{$eName}">
        <xsl:if test="$attribString">
          <xsl:variable name="attribList" 
select="tokenize($attribString,'\s+')"/>
          <xsl:for-each select="$attribList">
            <xsl:variable name="name" select="substring-before(.,'=')"/>
            <xsl:variable name="value"
              select="substring-before(substring-after(.,'&quot;'),'&quot;')"/>
            <xsl:attribute name="{$name}">
              <xsl:value-of select="$value"/>
            </xsl:attribute>
          </xsl:for-each>
        </xsl:if>
        <!-- before the close tag but after the first > which closes this 
initial element -->
        <xsl:sequence
          
select="d:parseFragmentString(substring-after(substring-before($instring,$closeTag),'&gt;'))"
        />
      </xsl:element>
      <!-- everything after the element -->
      <xsl:if test="substring-after($instring,$closeTag)">
        <xsl:sequence
          select="d:parseFragmentString(substring-after($instring,$closeTag))"/>
      </xsl:if>
    </xsl:when>
    <xsl:when test="matches($instring,'^&lt;/')">
      <!-- we've made it down to a close tag; if there's anything after it, 
process that -->
      <xsl:if test="normalize-space(substring-after($instring,'&gt;'))">
        <xsl:sequence 
select="d:parseFragmentString(substring-after($instring,'&gt;'))"
        />
      </xsl:if>
    </xsl:when>
    <xsl:when test="matches($instring,'^&lt;\?')">
      <!-- oh look a processing instruction -->
      <xsl:processing-instruction
        name="{substring-after(substring-before($instring,' '),'&lt;?')}"
        select="substring-after(substring-before($instring,'?&gt;'),' ')"/>
      <xsl:sequence 
select="d:parseFragmentString(substring-after($instring,'?&gt;'))"/>
    </xsl:when>
    <xsl:when test="matches($instring,'^[^&lt;]')">
      <!-- it's not a delimited node; emit it as a text node, and call again on 
everything after
                  the first < if we have one -->
      <xsl:choose>
        <xsl:when test="contains($instring,'&lt;')">
          <xsl:value-of select="substring-before($instring,'&lt;')"/>
          <xsl:sequence
            
select="d:parseFragmentString(concat('&lt;',substring-after($instring,'&lt;')))"
          />
        </xsl:when>
        <xsl:otherwise>
          <!-- nothing but a string, but it can have escaped XML entities in it
               which we need to unescape-->
          <xsl:value-of select="d:unEscapeXMLEntities($instring)"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:when test="matches($instring,'^&lt;$')">
      <!-- we have a wandering less-than sign -->
      <xsl:value-of select="$instring"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:message>
        <xsl:text>NO MATCH!&#x000A;</xsl:text>
        <xsl:text>:|</xsl:text>
        <xsl:value-of select="$instring"/>
        <xsl:text>|:&#x000A;</xsl:text>
      </xsl:message>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

The above works in its context. I should not care to assert that it was
fully general, but it ought to at least present a notion of how to
approach the problem.

-- Graydon

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--