I would certainly tend to do this in XSLT unless I needed to (and had time
to) make it ultra-efficient in which case a Java solution might be faster.
I would never attempt to hand-parse XML, but there are cases where combining
several XML documents into one big document "by hand" is perfectly OK,
including a bit of manipulation like stripping off the XML declaration - so
long as you are confident the files all use the same encoding, don't use
internal DTDs, and so on.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Grant Slade [mailto:grant(_dot_)slade(_at_)gmail(_dot_)com]
Sent: 25 June 2007 00:33
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] XPath Question (related to Java)
Hi Michael - thanks for the heads up. Maybe I can ask you
and the group a more general question. What I was trying to
do was go through a file of dictionary terms, read in the
terms one at a time and then add them to a 3rd party native
xml database application that takes a well-formed xml
document (but in String format, thus my trying to get the
information from it in String format). I have been trying to
be a good student of XML and learn the APIs, but I am
wondering if in some cases it is better to just parse it as a
string, such as in this case where it needs to retain to
remain the tagging. Or maybe xslt would have been a better
option to go with from the beginning?
On 6/24/07, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
In the XPath data model, you see nodes rather than markup.
That's why
there's no "<" present. Instead, the Definition element will have a
child that is a <sub> element.
Evaluating the expression as a string will give you the
string value
of the node, this is the concatenation of all the contained text,
ignoring the markup.
You seem to want to serialize the node as XML, to reinstate
the markup.
There's no direct way of doing that in the XPath API; you probably
have to do an identity transformation from a DOMSource
containing the
node to a StreamResult. (You'll have to change your call to
retrieve a
NODESET rather than a STRING). Alternatively there may be a method
such as toXML() on the DOM Node object - I've forgotten.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Grant Slade [mailto:grant(_dot_)slade(_at_)gmail(_dot_)com]
Sent: 24 June 2007 19:03
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] XPath Question (related to Java)
Hi, I have the following xml which gets read from a file
as part of
a Node:
<Definition> An organic compound in which the
aldehyde
group (HC=O) is connected to a branched or unbranched
open chain of
carbon atoms rather than a ring.
Some aldehydes are created during the reactions of
oxidants used as
disinfectants, particularly ozone (O<sub>3</sub>), with natural
organic matter. </Definition>
When I run it through the following method it ignores the
<sub></sub>:
public String getDefinitionFromNode(Node node) throws
javax.xml.xpath.XPathExpressionException
{
XPath xpath = XPathFactory.newInstance().newXPath();
String definitionExpression = "Definition";
String definition = (String)
xpath.evaluate(definitionExpression, node, XPathConstants.STRING);
if(definition.contains("<"))
System.out.println ("found a <");
else
{
System.out.println ("did not find a <");
}
return definition;
}
When the program runs, it outputs the following:
did not find a <
--------------------------------
<dictionary n=""><TermName>aliphatic
aldehyde</TermName><Definition>An organic compound in which the
aldehyde group (HC=O) is connected to a branched or
unbranched open
chain of carbon atoms rather than a ring.
Some aldehydes are created during the reactions of
oxidants used as
disinfectants, particularly ozone (O3), with natural organic
matter.</Definition></dictionary>
How do I get it to output the <sub></sub> elements?
The complete node is:
<Term>
<Entry> aliphatic aldehyde </Entry>
<Definition> An organic compound in which the
aldehyde
group (HC=O) is connected to a
branched or unbranched open chain of carbon atoms
rather than a ring. Some aldehydes
are created during the reactions of
oxidants used as
disinfectants, particularly
ozone (O<sub>3</sub>), with natural
organic matter.
</Definition>
<SeeAlso>disinfection by-product</SeeAlso>
<IMAGE fileName="A-17.gif"/>
</Term>
--~-----------------------------------------------------------------
- XSL-List info and archive:
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--