xsl-list
[Top] [All Lists]

[xsl] Changing a from unstructured HTML to XML

2010-09-21 08:17:03
Hi all,

First of all, I want to thank the people here for their help in
getting me on my feet, with special thanks to Gerrit. I've been
learning to solve some of my own problems (such as how to get rid of
xmlns="") but there's one -- for a completely different project --
that's stumping me on a conceptual level.

I am working with an HTML input file, and I'd like to group things
better by sections (ultimately, with the intent of using
xml:result-document to create a new file for each section).

What I have is not uncommon:

<h1 class="section">Section Name</h1>
<h1 class="headline">Headline name</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 2</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 3</h1>
[... assorted HTML marked up text ...]
<h1 class="section">Section 2</h1>
<h1 class="headline">Headline 4</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 5</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 6</h1>
[... assorted HTML marked up text ...]

and so on.

What I'd like to end up with is, if possible

<section id="Section Name">
  <headline id="Headline ">
     [...marked up text...]
  </headline id="Headline 2">
  <headline>
     [...marked up text...]
   </headline>
  <headline id="Headline 3">
     [...marked up text...]
   </headline>
</section>


Maybe this is "XSL 101"and it must be common in HTML-to-XML
transformations. I would imagine that there must be some techniques to
form a proper tree such that a conventional HTML page is turned to

<body>
<h1>
   <h2>
      <h3>
         <p>Marked up text</p>
      </h3>
   </h2>
</h1>


but I'm not having much luck finding the techniques to do this.
Certainly the "tidy" implementations are no help. :-)

Any pointers are appreciated. Thanks!

- Evan

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>