[Top] [All Lists]

Re: [xsl] Scraping to Analyze Structure

2016-09-08 13:41:04
Ah, yes, curl , tagsoup that's a good strategy.

Even so, since this is an application that requires many inputs to get to
various pages, selenium or some webdriver is nevertheless needed.

I found the xslt example here:


and it outputs something like this:


So, that's what I was looking for.  Having a little trouble getting this to
work in Java.


On Mon, Aug 29, 2016 at 5:28 PM, Ihe Onwuka 
ihe(_dot_)onwuka(_at_)gmail(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

On bash I do

curl theWebPage.html | java -jar $HOME/tagsoup-1.2.1.jar --nons  | java
-jar $HOME/saxon9he.jar -s:- -xsl:yourXSLTFile.xsl

which pipes the web page under test into tagsoup which converts it to well
formed XML which I then pipe into an XSL transformation.

I don't bother with things like Selenium for exactly the reasons you are
complaining about but of course your team may not buy into that.

On Mon, Aug 29, 2016 at 7:25 PM, Hank Ratzesberger 
xml(_at_)xmlwerks(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi XSL List,

I am hoping to improve our test automation built on Selenium. The xpath
to elements in our tests is complicated. Any changes break the workflow and
fixing the xpath is manual process and slow.

If, in the process of running a test, if the web page was scrapped and
put into an xml file, or even a text file, with xpath to all inputs and
other controls, differences could be reported, and those differences might
even be able to be cut and pasted to fix the test in the next update.

In any case, processing this way could rationalize / normalize the xpath
to all controls.  This way, developers don't have to keep deciphering when
pages change.

Has anyone here seen something like this? It would seem to be something
xslt was made for.


Hank Ratzesberger
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://-list/601651> (by email)

XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <-list/506689> (by
email <>)

Hank Ratzesberger
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [xsl] Scraping to Analyze Structure, Hank Ratzesberger xml(_at_)xmlwerks(_dot_)com <=