xsl-list
[Top] [All Lists]

Re: [xsl] trouble with preceding:: and parsing xhtml

2009-10-04 03:23:15
I solved my last issue.  When I was initially performing this transformation
with command line tools, I forgot that I was cleaning the original HTML
by not only running "HtmlCleaner", but also using "sed" to rename all attribute
named "id" because there were multiple occurances with the same value,
which of course, is particularly not allowed with the "id" attribute,
since it has special value-uniqueness enforcement by parsers.

So after I added:

                Object[] fieldNodes = 
result.evaluateXPath("//*[(_at_)id='field']");
                for (Object node : fieldNodes) {
                        if (node instanceof TagNode) {
                                //System.out.println(((TagNode)node).getName());
                                ((TagNode)node).removeAttribute("id");
                                ((TagNode)node).addAttribute("tid", "field");
                        }                       
                }

It worked as in the command line tools - both xalan and saxon.  Although,
as mentioned in the last email, in order to use saxon, I have to use xerces
as the parser.

   -Chris 

Chris Wolf wrote:
Unfortunately, after I moved the application to Java (xalan, whatever is 
baked in
jdk-1.5.x) it still renders *some* nodes with 
preceding::div[(_at_)tid='field'][1] 
with the value of the first node, so with those, I tried flipping it by 
replacing 
"[1]" with "[last()]" again, but that hack only worked for some nodes.

Other then programmtically, the stylesheet works perfectly fine with 
"xsltproc" (MacOS/Linux) and "msxsl" on Windoze.

I also tried your Saxon-6.5.5 which works fine from the command line,
i.e. java -jar /opt/saxon-6.5.5/saxon.jar af.xhtml fbdata.xsl

...works.  Unfortunately, I get the same weird results when I replace
the default "javax.xml.transform.TransformerFactory" impl with
"com.icl.saxon.TransformerFactoryImpl".

Actually - saxon won't even read the xsl file unless I override and revert
the parser back to the built-in jdk (xerces) parser.  Unless, I do that,
I get:

      at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:121)
      at 
com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
      at 
com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
      at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)
Caused by: java.io.EOFException: no more input
      at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
      at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
      at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
      at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
      at com.icl.saxon.om.Builder.build(Builder.java:265)
      at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
      ... 3 more
---------
java.io.EOFException: no more input
      at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
      at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
      at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
      at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
      at com.icl.saxon.om.Builder.build(Builder.java:265)
      at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
      at 
com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
      at 
com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
      at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)


Overriding the parser to be 
"com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"
fixes this, but the resulting transformation does not look anything like what
I get from that command line.

I'm using saxon-6.5.5 like this:

System.setProperty("javax.xml.transform.TransformerFactory", 
      "com.icl.saxon.TransformerFactoryImpl");
System.setProperty("javax.xml.parsers.SAXParserFactory", 
      "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");

HtmlCleaner cleaner = new HtmlCleaner();
TagNode result = cleaner.clean(new File(fin), "utf-8");
Document doc = new DomSerializer(cleaner.getProperties(), 
true).createDOM(result);

TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource ss = new StreamSource(xsl);
Transformer xform = tFactory.newTransformer(ss);
StringWriter sw = new StringWriter();
StreamResult sr = new StreamResult(sw);

xform.transform(new DOMSource(doc), sr);
sw.flush();
System.out.println(sw.toString());
      

BTW, when I ran saxon succesfully from the command line, I fed it a document
produced by HtmlCleaner, from the command line, via:
java -jar /opt/jlib/htmlcleaner2_1.jar src=countrytemplate_af.html 
dest=af.data outcharset=utf-8



Thanks,

  -Chris W.

Michael Kay wrote:
You're nearly there: you want  

preceding::div[(_at_)tid='field'][1]

Without the [1], you select all of them throughout the document; and if you
then use something like xsl:value-of (in XSLT 1.0) then you get the one that
is first in document order.

Then I tried preceding::div[(_at_)tid='field' and last()] 
last() always gives a number that is 1 or more. "and last()" converts this
number to a boolean, and any number other than 1 is treated as true. So
you're adding "and true()" to your predicate, which doesn't change its
result. You were probably thinking of

preceding::div[(_at_)tid='field'][last()]

which means

preceding::div[(_at_)tid='field'][position() = last()]

But numeric predicates attached to a reverse axis step count the nodes in
reverse document order: 1 is the nearest, and last() is the furthest. So the
correct predicate is [1].

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 
 

-----Original Message-----
From: Chris Wolf [mailto:cw10025(_at_)gmail(_dot_)com] 
Sent: 03 October 2009 20:37
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] trouble with preceding:: and parsing xhtml

I have some xhtml documents that I want to process with XSL.  
The patterns that I'm interested in have a series of 
occurances of "div" element in pairs as in:

<xhtml...>
<head/>
<body..>
<table...>
<tr..>
<td...>
<div tid="field"><a href="...">Foo</a></div> <table...> 
<tr...> <td...> <div class="category_data">Bla,Bla,Bla</div>
<...>

this pattern of the two pairs of div variations repeats an 
arbitrary number of times throughout the document and there 
could be other "div" elements interspersed, but not with the 
same qualifying attributes.


Note that the "div" with "class='category_data'" is not a 
descendant of the first "div[(_at_)tid='field']"
I don't think these pairs of DIVs are siblings either (at the 
same level).

Basically, I'm trying to generate XML of name-value pairs 
where the name
comes from the content of the <a/> in the first 
"div[(_at_)tid='field']", and the value is the
content of the second "div[(_at_)class='category_data']".

So the output should be:
<Field name="Foo">Bla,Bla,Bla</Field>

Where the value of the "name" attribute is the content of the 
input doc's
div[(_at_)tid='field']/a, i.e. in this example, 'Foo'

...and the content of "Field" is the content of the input doc's
div[(_at_)class='category_data']



Since the the second div is not a descendant of the first, I 
can't capture 
the <a/> content in a variable and call <xsl:apply-templates 
select="div[(_at_)class='category_data']"/>
with a parameter.

The question is how else to pass data from one template to 
another template?

I tried "reaching back" from the second template by using 
preceding::div[(_at_)tid='field']
but this retrieved the value of the first node matching 
"div[(_at_)tid='field']" not
the immediately preceding node that matches, as I would have 
expected.  Then I tried
preceding::div[(_at_)tid='field' and last()] - same result; always 
the same value and
always the value of the very first node that matched.

I guess I have no idea how "preceding::" is supposed to work.


I would greatly appreciate any help.  

Thanks,

   -Chris

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    xmlns:h="http://www.w3.org/1999/xhtml";>

<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="div a"/>

<xsl:template match="/">
  <xsl:message>***** ROOT</xsl:message>
    <xsl:apply-templates select="//h:div"/>
</xsl:template>

<xsl:template match="h:div[(_at_)tid='field']">
  <xsl:message>***** DIV1</xsl:message>
  <xsl:apply-templates select="h:div"/>
</xsl:template>

<xsl:template match="h:div[(_at_)class='category_data']">
  <xsl:param name="fname"/>
  <xsl:message>***** DIV2</xsl:message>
  <xsl:message>^<xsl:value-of 
select="preceding::h:div[(_at_)tid='field']"/>^</xsl:message>
  <xsl:element name="Field">
    <xsl:attribute name="name">
      <xsl:value-of select="preceding::h:div[(_at_)tid='field']"/>
    </xsl:attribute>
    <xsl:value-of select="."/>
  </xsl:element><xsl:text>
</xsl:text>
        <xsl:apply-templates/>
</xsl:template>

<xsl:template match="text()">
  <xsl:message>***** TEXT</xsl:message>
    <xsl:apply-templates/>
</xsl:template>

</xsl:stylesheet>


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--