xsl-list
[Top] [All Lists]

Re: [xsl] Dealing mixed content with invalid node-like text

2011-12-06 16:42:17
Hello David,

Yes, I do process the content in 2 stages, preprocess into one form of XML and 
then further process that to my final XML form. BUT, BOTH are done in XSL with 
one signle file and the problem that I reported is in first stage conversion 
itself. To make things even more clear, here is a rough skeleton and 
explanation of my process.I get the entire content of the input into a variable 
$input-text, and then tokenize it to get each line of data into another 
variable, as below.

<xsl:variable name="lines" select="tokenize($input-text, '\r?\n')"/>

<!--then pass it to another template to process each line of data:-->
<xsl:call-template name="process-lines">
                <xsl:with-param name="lines" select="$lines"/>
</xsl:call-template>

<!-- And here, I  further process it to select the REQUIRED value, -->
<xsl:template name="process-lines">
                                <xsl:param name="lines" as="xs:string*"/>

                                <xsl:for-each select="$lines">
                                                <xsl:variable 
name="line-components" select="tokenize(.,'\t')"/>

                                                  <xsl:for-each 
select="$line-components[position() = last()]">
                                                             <value>
                                                                         
<xsl:call-template name="tag-text">
                                                                                     
 <xsl:with-param name="unparsed" select="."/>
                                                                          
</xsl:call-template>
                                                              </value>
                                                  </xsl:for-each>


<!-- AND IT IS HERE in this "ag-text" template, I try to achieve  what I 
explained in my original posting    --> 
 <xsl:template name="tag-text">
       <xsl:param name="unparsed" required="yes"/>
         <xsl:analyze-string select="$unparsed" 
regex="^(.*?)&lt;(.+)&gt;(.*)&lt;/(.+)&gt;(.*?)$">      

       etc as posted earlier. 

The skeleton input will be like (as I mentioned before):

Line one text <b>within valid node</b> and like <II .> Title etc
Line two with <1a .> Title etc, <i>within</i> <b>something</b> etc
another line can be just normal text
....

And it is vital here I get the data in the way I wanted, so that out final 
output in stage two is correct. And inview of this I cannot use <value-of 
select with d-o-e> here. As it seems this cannot be acheived by XSL (looks 
likely) I am trying to get my source corrected. But if there are solution 
available, in xsl or with better regex, I would be happy to use. I hope the 
above clarifies your question. 

Thanks,
Karl


----- Original Message -----
From: David Carlisle <davidc(_at_)nag(_dot_)co(_dot_)uk>
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Re: Dealing mixed content with invalid node-like text


nd you can assume it as something like a text file format

but your post said that you were using xsl:analyze-string, which means that you 
must somehow be pre-processing your text format into XML before it gets to XSLT 
as otherwise the input would not be well formed and XSLT would not even start. 
We can't help with the XSLT question you asked unless we know what the input 
looks like _to XSLT_.

David       

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--