Re: [xsl] trouble with preceding:: and parsing xhtml
2009-10-04 03:23:15
I solved my last issue. When I was initially performing this transformation
with command line tools, I forgot that I was cleaning the original HTML
by not only running "HtmlCleaner", but also using "sed" to rename all attribute
named "id" because there were multiple occurances with the same value,
which of course, is particularly not allowed with the "id" attribute,
since it has special value-uniqueness enforcement by parsers.
So after I added:
Object[] fieldNodes =
result.evaluateXPath("//*[(_at_)id='field']");
for (Object node : fieldNodes) {
if (node instanceof TagNode) {
//System.out.println(((TagNode)node).getName());
((TagNode)node).removeAttribute("id");
((TagNode)node).addAttribute("tid", "field");
}
}
It worked as in the command line tools - both xalan and saxon. Although,
as mentioned in the last email, in order to use saxon, I have to use xerces
as the parser.
-Chris
Chris Wolf wrote:
Unfortunately, after I moved the application to Java (xalan, whatever is
baked in
jdk-1.5.x) it still renders *some* nodes with
preceding::div[(_at_)tid='field'][1]
with the value of the first node, so with those, I tried flipping it by
replacing
"[1]" with "[last()]" again, but that hack only worked for some nodes.
Other then programmtically, the stylesheet works perfectly fine with
"xsltproc" (MacOS/Linux) and "msxsl" on Windoze.
I also tried your Saxon-6.5.5 which works fine from the command line,
i.e. java -jar /opt/saxon-6.5.5/saxon.jar af.xhtml fbdata.xsl
...works. Unfortunately, I get the same weird results when I replace
the default "javax.xml.transform.TransformerFactory" impl with
"com.icl.saxon.TransformerFactoryImpl".
Actually - saxon won't even read the xsl file unless I override and revert
the parser back to the built-in jdk (xerces) parser. Unless, I do that,
I get:
at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:121)
at
com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
at
com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)
Caused by: java.io.EOFException: no more input
at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
at com.icl.saxon.om.Builder.build(Builder.java:265)
at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
... 3 more
---------
java.io.EOFException: no more input
at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
at com.icl.saxon.om.Builder.build(Builder.java:265)
at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
at
com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
at
com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)
Overriding the parser to be
"com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"
fixes this, but the resulting transformation does not look anything like what
I get from that command line.
I'm using saxon-6.5.5 like this:
System.setProperty("javax.xml.transform.TransformerFactory",
"com.icl.saxon.TransformerFactoryImpl");
System.setProperty("javax.xml.parsers.SAXParserFactory",
"com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");
HtmlCleaner cleaner = new HtmlCleaner();
TagNode result = cleaner.clean(new File(fin), "utf-8");
Document doc = new DomSerializer(cleaner.getProperties(),
true).createDOM(result);
TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource ss = new StreamSource(xsl);
Transformer xform = tFactory.newTransformer(ss);
StringWriter sw = new StringWriter();
StreamResult sr = new StreamResult(sw);
xform.transform(new DOMSource(doc), sr);
sw.flush();
System.out.println(sw.toString());
BTW, when I ran saxon succesfully from the command line, I fed it a document
produced by HtmlCleaner, from the command line, via:
java -jar /opt/jlib/htmlcleaner2_1.jar src=countrytemplate_af.html
dest=af.data outcharset=utf-8
Thanks,
-Chris W.
Michael Kay wrote:
You're nearly there: you want
preceding::div[(_at_)tid='field'][1]
Without the [1], you select all of them throughout the document; and if you
then use something like xsl:value-of (in XSLT 1.0) then you get the one that
is first in document order.
Then I tried preceding::div[(_at_)tid='field' and last()]
last() always gives a number that is 1 or more. "and last()" converts this
number to a boolean, and any number other than 1 is treated as true. So
you're adding "and true()" to your predicate, which doesn't change its
result. You were probably thinking of
preceding::div[(_at_)tid='field'][last()]
which means
preceding::div[(_at_)tid='field'][position() = last()]
But numeric predicates attached to a reverse axis step count the nodes in
reverse document order: 1 is the nearest, and last() is the furthest. So the
correct predicate is [1].
Regards,
Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay
-----Original Message-----
From: Chris Wolf [mailto:cw10025(_at_)gmail(_dot_)com]
Sent: 03 October 2009 20:37
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] trouble with preceding:: and parsing xhtml
I have some xhtml documents that I want to process with XSL.
The patterns that I'm interested in have a series of
occurances of "div" element in pairs as in:
<xhtml...>
<head/>
<body..>
<table...>
<tr..>
<td...>
<div tid="field"><a href="...">Foo</a></div> <table...>
<tr...> <td...> <div class="category_data">Bla,Bla,Bla</div>
<...>
this pattern of the two pairs of div variations repeats an
arbitrary number of times throughout the document and there
could be other "div" elements interspersed, but not with the
same qualifying attributes.
Note that the "div" with "class='category_data'" is not a
descendant of the first "div[(_at_)tid='field']"
I don't think these pairs of DIVs are siblings either (at the
same level).
Basically, I'm trying to generate XML of name-value pairs
where the name
comes from the content of the <a/> in the first
"div[(_at_)tid='field']", and the value is the
content of the second "div[(_at_)class='category_data']".
So the output should be:
<Field name="Foo">Bla,Bla,Bla</Field>
Where the value of the "name" attribute is the content of the
input doc's
div[(_at_)tid='field']/a, i.e. in this example, 'Foo'
...and the content of "Field" is the content of the input doc's
div[(_at_)class='category_data']
Since the the second div is not a descendant of the first, I
can't capture
the <a/> content in a variable and call <xsl:apply-templates
select="div[(_at_)class='category_data']"/>
with a parameter.
The question is how else to pass data from one template to
another template?
I tried "reaching back" from the second template by using
preceding::div[(_at_)tid='field']
but this retrieved the value of the first node matching
"div[(_at_)tid='field']" not
the immediately preceding node that matches, as I would have
expected. Then I tried
preceding::div[(_at_)tid='field' and last()] - same result; always
the same value and
always the value of the very first node that matched.
I guess I have no idea how "preceding::" is supposed to work.
I would greatly appreciate any help.
Thanks,
-Chris
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:h="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="div a"/>
<xsl:template match="/">
<xsl:message>***** ROOT</xsl:message>
<xsl:apply-templates select="//h:div"/>
</xsl:template>
<xsl:template match="h:div[(_at_)tid='field']">
<xsl:message>***** DIV1</xsl:message>
<xsl:apply-templates select="h:div"/>
</xsl:template>
<xsl:template match="h:div[(_at_)class='category_data']">
<xsl:param name="fname"/>
<xsl:message>***** DIV2</xsl:message>
<xsl:message>^<xsl:value-of
select="preceding::h:div[(_at_)tid='field']"/>^</xsl:message>
<xsl:element name="Field">
<xsl:attribute name="name">
<xsl:value-of select="preceding::h:div[(_at_)tid='field']"/>
</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element><xsl:text>
</xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()">
<xsl:message>***** TEXT</xsl:message>
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
|
|