Hello,
I have a situation where in I need to deal mixed content text that also come
with text wthin angle brackets, converted to XML output. For example, texts
like:
"Sometext <xx>within valid node</xx> and like <II .> Title etc"
"Sometext like <1a .> Title etc, <xx>within <b>something</b> valid node</xx>
etc".
Now, the output has to be like:
<nodename>Sometext <xx>within valid node</xx> and like <II .> Title
etc</nodename>
<nodename>Sometext like <1a .> Title etc, <xx>within <b>something</b>
valid node</xx> etc</nodename>
At present I do not get things like <br/> but assume I get so, it being valid,
I should treat it as node. The point I am trying to make is, <II .> and <1a .>
like non-node things needs to be treated removing their angle brackets to make
the XML valid. Currently I use analyze-string with a regex to deal this, which
does not work correctly (due to mistakes). But I would like to know whether
there is good standard solution to deal with these sort of text. At present
each line of text is passed to this template and treated like:
<xsl:template name="tag-text">
<xsl:param name="unparsed" required="yes"/>
<xsl:analyze-string select="$unparsed"
regex="^(.*?)<(.+)>(.*)</(.+)>(.*?)$"> <!-- this regex has flaws,
in that fails to treat those invalid nodes -->
<xsl:matching-substring> ** do process and
if necessary recuressively call this template again **
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
I suspect possibly there could be a better regex to get the solution I wanted,
but not sure whether xslt itself has better way to deal this. Pls can you
suggest possible solutions (incl better regex if any of you used it
successfully).
Thanks in advance,
Karl
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--