xsl-list
[Top] [All Lists]

Re: [xsl] feasibility of HTML input

2006-03-17 12:03:36
Dianne,
This is going to be real hacky, so get ready.

This suggestion uses XSLT 1.0.

Your HTML document must be on-line to do this.

Ok, you have some html that is messy and you want to capture it into
your xslt.

You could go to one of the Tidy on-line urls.

http://www.1-hit.com/all-in-one/tool.html-cleaner.htm

Use the site to clean up the HTML. Pick the options you want, etc.

Actuate the tidy function on the site. When you get the result, you
can capture the URL.

Cut the string before your HTML url. This string can then be used as
variable for your XSLT.

In your XSLT you can simply concat your document name to the string
(in your document function or previously).

This will give you a valid xml document to work with in your xslt.

Anyway, if you want (or can stand) to hear more, I can give you more
detail.

But it did work when I used it and saved me alot of time in the
process.

Mike Ferrando
Library Technician
Library of Congress
Washington, DC
202-707-4454



--- Jay Bryant <jay(_at_)bryantcs(_dot_)com> wrote:

Hi, Dianne,

The only trick to using HTML as input to XSLT is that the HTML has
to comply
with the definition of well-formed XML. To do that, use one of the
Tidy
programs.

From reading the other responses, I see that you might also be able
to get
XML as your input. That will be MUCH more straightforward and very
likely
save you a bunch of time. Were I in similar straits, I would
definitely go
that route.

FWIW

Jay Bryant
Bryant Communication Services

----- Original Message ----- 
From: <didoss(_at_)comcast(_dot_)net>
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Friday, March 17, 2006 11:04 AM
Subject: [xsl] feasibility of HTML input


I'm new to the list and to xsl and xslt.

The goal of this e-mail is to just confirm the feasibility of my
endeavor.
It
would be a bonus if someone pushed me in a helpful direction - or
I can
keep
wandering, which is ok too.

I haven't found much about the feasibility of using an html file
as input.
I
didn't find anything useful through Google searches, though being
new to
xsl and
xslt, I might have not entered the right phrase.  The 2 O'Reilly
books
that I
have also didn't clearly direct me towards a solution - but also
didn't
say that
it couldn't be done.

Digging through the FAQ, here, I *did* finally find a couple
references to
using
HTML as input.  That at least gave me confidence that this is not
a
completely
insane idea.  I didn't get a clear idea of the requirements, but
definitely
understood that I should TIDY my html before trying to parse it. 
:)

So, here I am thinking that it might be possible, but I have
spent a bit
of time
digging, and decided that I might want to check with the experts
before
spinning
wheels further.

=========================================
Is this feasible,...worthwhile,...better done with another
utility?
=========================================

My team produces nightly JUnit reports and Emma coverage reports
for our
code.
I have added a task to copy off the top-level html pages for
these results
for
historical purposes.  I would like to be able to run a transform
across
the
files in the respective directory (one transform for JUnit and
one for
Emma) to
create summary files (probably comma delimited, to be able to
pull into
Excel).
The summary file could then be used to recognize and learn from
trends in
these
results.

If this is feasible and worthwhile, and not better done with
another
utility, I
will send my current xsl and what I'm running into with it.

Thanks for any advice and/or direction you can provide,
Dianne



--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--