xsl-list
[Top] [All Lists]

Re: [xsl] problem with transforming mixed content

2020-08-15 06:01:26
Like Graydon's solution, this solution falls into category (b): convert the 
markup to text, then process as text. And like Graydon's solution, it makes 
assumptions about the markup and text content that can be encountered in the 
mixed content: in this case, the only markup it handles is what appears in the 
supplied test case, that is, an <i> element with no attributes, and it assumes 
that the '##' sequence won't appear naturally. The problem with this kind of 
solution is that when you process 10,000 input documents it will do the right 
thing for 9,999 of them, and you need very good testing to catch the failures. 
In fact, you'll only catch the failure if you put a lot more effort into the 
testing than you put into the actual code.

(I'm working this morning on a bug I've created in the course of Saxon 
development that causes just 2 tests out of 30,000 in the QT3 test suite to 
fail. Or there might be two bugs, of course. Indeed, more worryingly, there 
might be three, and the tests are only catching two of them. As I'm sure you've 
found in your work on Xerces, you can have a vast test suite and bugs can still 
slip through. The general assumption with question-and-answer forums seems to 
be that one test case is enough, and that's blatantly wrong.)

Mukul wrote:


I've come up with following XSLT transform, which seems to work for this use 
case,

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform 
<http://www.w3.org/1999/XSL/Transform>" 
                         xmlns:xs="http://www.w3.org/2001/XMLSchema 
<http://www.w3.org/2001/XMLSchema>"                                           
       
                         exclude-result-prefixes="xs">

   <xsl:output method="xml" indent="yes"/>

   <xsl:template match="title">
      <result>
         <xsl:variable name="result_pass1" as="xs:string*">
            <xsl:apply-templates select="node()" mode="pass1"/>
         </xsl:variable>
         <title>
            <xsl:for-each 
select="tokenize(normalize-space(substring-before(string-join($result_pass1, 
''), ':')), '##')">
               <xsl:call-template name="process_tokenize_result_item">
                <xsl:with-param name="inpStr" select="."/>
               </xsl:call-template>
            </xsl:for-each>
         </title>
         <subtitle>
            <xsl:for-each 
select="tokenize(normalize-space(substring-after(string-join($result_pass1, 
''), ':')), '##')">
               <xsl:call-template name="process_tokenize_result_item">
                  <xsl:with-param name="inpStr" select="."/>
               </xsl:call-template>
            </xsl:for-each>
         </subtitle>
      </result>
   </xsl:template>
   
   <xsl:template name="process_tokenize_result_item">
      <xsl:param name="inpStr" as="xs:string"/>
      
      <xsl:choose>
               <xsl:when test="position() mod 2 = 0">
                 <i>
                   <xsl:value-of select="."/>
                 </i>                    
               </xsl:when>
               <xsl:otherwise>
                 <xsl:value-of select="."/>
               </xsl:otherwise>
      </xsl:choose>      
   </xsl:template>
   
   <xsl:template match="node()" mode="pass1">
       <xsl:choose>
          <xsl:when test="self::i">
             <xsl:value-of select="concat('##', lower-case(.), '##')"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="lower-case(.)"/>
          </xsl:otherwise>
       </xsl:choose>
   </xsl:template>

</xsl:stylesheet>

The above XSLT transform, when provided following XML input document,

<title>THE TITLE OF THE BOOK WITH SOME <i>ITALICS</i> AND SOME MORE
WORDS: THE SUBTITLE OF THE BOOK WITH SOME <i>ITALICS</i></title>

produces following result,

<result>
   <title>the title of the book with some <i>italics</i> and some more 
words</title>
   <subtitle>the subtitle of the book with some <i>italics</i>
   </subtitle>
</result>

This solution, follows a two pass approach. In the first pass, the element 
constructs <i>text</i> are transformed into ##text##  (assuming that 
delimiter ## doesn't interfere with the input text). The result of pass one, 
is transformed into the final result by second pass.



-- 
Regards,
Mukul Gandhi
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by 
email <>)
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>