Jon,
I remember your similar post from a few days ago and had kept it aside to
look at in time; today's repost prompted me to do so. Well, the bad news is
that while I think there's an elegant, recursive, etc XSL solution, I've
seen it only hazily in the past half-hour.
The good news is that this is readily amenable to a procedural solution, if
you'll accept one, as implemented in the following jython script. The idea
is that we write out each line, maintain a "stack" of open-tags we find, and
write closing and re-opening tags at each <br/>.
The script below doesn't use an XML parser, but could readily be fitted to
use a SAX parser. In which case you wouldn't need the additional
simplyfying assumption that element start-tags are on lines by themselves.
Given your input below:
<p>
<span style="style a">
span a text 1
<span style="style b">
span b pre br text
<br name="b"/>
span b post br text
</span>
span a text 2
</span>
</p>
it produces:
<p>
<span style="style a">
span a text 1
<span style="style b">
span b pre br text
</span>
</span>
<br name="b"/>
<span style="style a">
<span style="style b">
span b post br text
</span>
span a text 2
</span>
</p>
which is what you asked for:
------
<p>
<span style="style a">
span a text 1
<span style="style b">
span b pre br text
</span>
</span>
<span style="style a"/>
<br name="b"/>
<span style="style a">
<span style="style b">
span b post br text
</span>
span a text 2
</span>
</p>
------
And given your input from your original email (reformatted for start-tags
are on lines by themselves):
<p>
<strong>
strong:text(top)
<br/>
prefix
<span style="a style">
span a <span style="rgb();">
span b
<br/>
text
</span>
text
</span>
strong:text(btm) <br/>
suffix
<br/>
</strong>
Root level text with
<br/>
tag.
</p>
it produces this:
<p>
<strong>
strong:text(top)
</strong>
<br/>
<strong>
prefix
<span style="a style">
span a
<span style="rgb();">
span b
</span>
</span>
</strong>
<br/>
<strong>
<span style="a style">
<span style="rgb();">
text
</span>
text
</span>
strong:text(btm)
</strong>
<br/>
<strong>
suffix
</strong>
<br/>
<strong>
</strong>
Root level text with
<br/>
tag.
</p>
Here's the script (which is more illustrative than production-worthy):
import re
stack, b, linecount = [], 0, 0
f = open(r'e:\temp\test.xml')
try:
while 1:
line = f.readline( ).rstrip()
if not line: break
linecount += 1
if linecount == 1:
pass
elif re.match("\s*<br.+>$", line):
b = 1
elif re.match("\s*<?\w+[^>]*>$", line): # start-tag
stack.append( line )
elif re.match("\s*</\w+[^>]*>$", line): # end-tag
stack = stack[:len(stack)-1]
if b:
# write closing tags
stack.reverse()
for t in stack:
m = re.match("(?P<TAB>\s*)<(?P<NAME>\w+)", t)
print "%s</%s>" % ( m.group('TAB'), m.group('NAME') )
print line
if b:
# write opening tags again
stack.reverse()
for t in stack:
print t
b = 0
finally:
f.close( )
Regards,
--A
_________________________________________________________________
Don?t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--