xsl-list
[Top] [All Lists]

RE: element to root

2005-05-31 13:44:33
Jon,

I remember your similar post from a few days ago and had kept it aside to look at in time; today's repost prompted me to do so. Well, the bad news is that while I think there's an elegant, recursive, etc XSL solution, I've seen it only hazily in the past half-hour.

The good news is that this is readily amenable to a procedural solution, if you'll accept one, as implemented in the following jython script. The idea is that we write out each line, maintain a "stack" of open-tags we find, and write closing and re-opening tags at each <br/>.

The script below doesn't use an XML parser, but could readily be fitted to use a SAX parser. In which case you wouldn't need the additional simplyfying assumption that element start-tags are on lines by themselves.

Given your input below:


<p>
  <span style="style a">
    span a text 1
    <span style="style b">
      span b pre br text
      <br name="b"/>
      span b post br text
    </span>
    span a text 2
  </span>
</p>

it produces:

<p>
 <span style="style a">
   span a text 1
   <span style="style b">
     span b pre br text
   </span>
 </span>

     <br name="b"/>

 <span style="style a">
   <span style="style b">
     span b post br text
   </span>
   span a text 2
 </span>
</p>

which is what you asked for:

------
<p>
  <span style="style a">
    span a text 1
    <span style="style b">
      span b pre br text
    </span>
  </span>
  <span style="style a"/>
  <br name="b"/>
  <span style="style a">
    <span style="style b">
      span b post br text
    </span>
    span a text 2
  </span>
</p>
------


And given your input from your original email (reformatted for start-tags are on lines by themselves):

<p>
 <strong>
   strong:text(top)
   <br/>
   prefix
     <span style="a style">
       span a        <span style="rgb();">
         span b
         <br/>
         text
       </span>
       text
      </span>
   strong:text(btm)    <br/>
   suffix
   <br/>
 </strong>
 Root level text with
 <br/>
 tag.
</p>

it produces this:

<p>
 <strong>
   strong:text(top)
 </strong>

   <br/>

 <strong>
   prefix
     <span style="a style">
       span a
       <span style="rgb();">
         span b
       </span>
     </span>
 </strong>

         <br/>

 <strong>
     <span style="a style">
       <span style="rgb();">
         text
       </span>
       text
      </span>
   strong:text(btm)
 </strong>

   <br/>

 <strong>
   suffix
 </strong>

   <br/>

 <strong>
 </strong>
 Root level text with

 <br/>

 tag.
</p>


Here's the script (which is more illustrative than production-worthy):

import re

stack, b, linecount = [], 0, 0
f = open(r'e:\temp\test.xml')

try:
   while 1:
       line = f.readline(  ).rstrip()
       if not line: break

       linecount += 1

       if linecount == 1:
           pass
       elif re.match("\s*<br.+>$", line):
           b = 1
       elif re.match("\s*<?\w+[^>]*>$", line): # start-tag
           stack.append( line )
       elif re.match("\s*</\w+[^>]*>$", line): # end-tag
           stack = stack[:len(stack)-1]

       if b:
           # write closing tags
           stack.reverse()
           for t in stack:
               m = re.match("(?P<TAB>\s*)<(?P<NAME>\w+)", t)
               print "%s</%s>" % ( m.group('TAB'), m.group('NAME') )

       print line

       if b:
           # write opening tags again
           stack.reverse()
           for t in stack:
               print t
           b = 0

finally:
   f.close(  )


Regards,

--A

_________________________________________________________________
Don?t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>