xsl-list
[Top] [All Lists]

RE: [xsl] Handling Non Well conformed HTML content

2006-10-03 06:12:22
I have typical issue in handling HTML content in XML document 
of the below structure and i want to replace the HTML 
template with the respective node element text.
HTML is not well formed. 

Before you can process the HTML, you will have to turn it into well-formed
XML. You can do this using the JTidy utility.

For that matter we are doing base64 
encode of the html content.

You'll have to find a Base64 decoder. Details will depend on your processing
environment, e.g. whether it's Java, Microsoft, or whatever.

However, I can't relate either of those points to the example you show
below.

Please provide any resolution for the same.
The replacement content might be in any part of the document.
Any suggestions are welcome.

Input content
<?xml version="1.0" encoding="UTF-8"?>
<broadcast>
  <content_vars>
   <content name="subject"><html>Hello [[BUYERS_NAME]]</html></ 
content><!--encoded-->
   <content name="text">REF Order [WEB_ORDER_NUMBER]</content><!-- 
encoded->
  </content_vars>

      <ORDER_FEED>
<ORDER>
<ORDER_HEADER>
<BUYERS_NAME>Senthil</BUYERS_NAME>
<WEB_ORDER_NUMBER>W12345<WEB_ORDER_NUMBER>
</ORDER_HEADER>
<!--Line Items-->
</ORDER>
</ORDER_FEED>
</broadcast>

XSLT I tried for the same
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/
Transform">

<xsl:output method="html" indent="yes" />

<xsl:template match="/broadcast">
        <xsl:apply-templates select="content_vars/content" />

</xsl:template>

<xsl:template match="content">

     <xsl:variable name="temp1" select="translate(., '[]', '')" />
     <xsl:variable name="temp2"

select="normalize-space(../following-sibling::*[contains($temp1,
local-name())])" />
     <xsl:variable name="temp3"
select="local-name(../following-sibling::*[contains($temp1,
local-name())])" />
     <xsl:value-of select="substring-before($temp1, $temp3)"
/><xsl:value-of select="$temp2" /><xsl:value-of 
select="substring-after($temp1, $temp3)" /> </xsl:template>

</xsl:stylesheet>

Expected output
<html>
Hello Senthil
REF Order W12345
</html>

And I am getting unexpected
<html>
Hello BUYERS_NAME
REF Order WEB_ORDER_NUMBER
</html>
Let me know how do I tweak the code to work as desired.

I think it's more than a tweak. Your main mistake is using the
following-sibling axis rather than following (the BUYERS_NAME element is not
a sibling of the content_vars element). But also, your code seems generally
lacking in robustness. You're ignoring both the HTML tagging and the [[...]]
markers (or [...] depending which of the two examples we look at); you're
assuming that there will only be one insert in each element, and that its
name won't clash with any other textual content in the element. This all
seems pretty poor coding.

Michael Kay
http://www.saxonica.com/


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>