xsl-list
[Top] [All Lists]

RE: [xsl] XPath Question (related to Java)

2007-06-25 15:03:52
I would certainly tend to do this in XSLT unless I needed to (and had time
to) make it ultra-efficient in which case a Java solution might be faster.

I would never attempt to hand-parse XML, but there are cases where combining
several XML documents into one big document "by hand" is perfectly OK,
including a bit of manipulation like stripping off the XML declaration - so
long as you are confident the files all use the same encoding, don't use
internal DTDs, and so on.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Grant Slade [mailto:grant(_dot_)slade(_at_)gmail(_dot_)com] 
Sent: 25 June 2007 00:33
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] XPath Question (related to Java)

Hi Michael - thanks for the heads up.  Maybe I can ask you 
and the group a more general question.  What I was trying to 
do was go through a file of dictionary terms, read in the 
terms one at a time and then add them to a 3rd party native 
xml database application that takes a well-formed xml 
document (but in String format, thus my trying to get the 
information from it in String format).  I have been trying to 
be a good student of XML and learn the APIs, but I am 
wondering if in some cases it is better to just parse it as a 
string, such as in this case where it needs to retain to 
remain the tagging.  Or maybe xslt would have been a better 
option to go with from the beginning?

On 6/24/07, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
In the XPath data model, you see nodes rather than markup. 
That's why 
there's no "<" present. Instead, the Definition element will have a 
child that is a <sub> element.

Evaluating the expression as a string will give you the 
string value 
of the node, this is the concatenation of all the contained text, 
ignoring the markup.

You seem to want to serialize the node as XML, to reinstate 
the markup.
There's no direct way of doing that in the XPath API; you probably 
have to do an identity transformation from a DOMSource 
containing the 
node to a StreamResult. (You'll have to change your call to 
retrieve a 
NODESET rather than a STRING). Alternatively there may be a method 
such as toXML() on the DOM Node object - I've forgotten.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Grant Slade [mailto:grant(_dot_)slade(_at_)gmail(_dot_)com]
Sent: 24 June 2007 19:03
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] XPath Question (related to Java)

Hi, I have the following xml which gets read from a file 
as part of 
a Node:
            <Definition> An organic compound in which the 
aldehyde 
group (HC=O) is connected to a branched or unbranched 
open chain of 
carbon atoms rather than a ring.
Some aldehydes are created during the reactions of 
oxidants used as 
disinfectants, particularly ozone (O<sub>3</sub>), with natural 
organic matter. </Definition>

When I run it through the following method  it ignores the
<sub></sub>:
      public String getDefinitionFromNode(Node node) throws 
javax.xml.xpath.XPathExpressionException
      {
            XPath xpath = XPathFactory.newInstance().newXPath();
            String definitionExpression = "Definition";
            String definition = (String) 
xpath.evaluate(definitionExpression, node, XPathConstants.STRING);
            if(definition.contains("<"))
                  System.out.println ("found a <");
            else
            {
                  System.out.println ("did not find a <");
            }
            return definition;
      }

When the program runs, it outputs the following:

did not find a <
--------------------------------
<dictionary n=""><TermName>aliphatic 
aldehyde</TermName><Definition>An organic compound in which the 
aldehyde group (HC=O) is connected to a branched or 
unbranched open 
chain of carbon atoms rather than a ring.
Some aldehydes are created during the reactions of 
oxidants used as 
disinfectants, particularly ozone (O3), with natural organic 
matter.</Definition></dictionary>

How do I get it to output the <sub></sub> elements?

The complete node is:
        <Term>
            <Entry> aliphatic aldehyde </Entry>
            <Definition> An organic compound in which the 
aldehyde 
group (HC=O) is connected to a
                branched or unbranched open chain of carbon atoms 
rather than a ring. Some aldehydes
                are created during the reactions of 
oxidants used as 
disinfectants, particularly
                ozone (O<sub>3</sub>), with natural 
organic matter.
</Definition>
            <SeeAlso>disinfection by-product</SeeAlso>
            <IMAGE fileName="A-17.gif"/>
        </Term>


--~-----------------------------------------------------------------
- XSL-List info and archive:  
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>