xsl-list
[Top] [All Lists]

Re: [xsl] Changing a from unstructured HTML to XML

2010-09-21 08:30:04
Evan Leibovitch wrote:

I am working with an HTML input file, and I'd like to group things
better by sections (ultimately, with the intent of using
xml:result-document to create a new file for each section).

What I have is not uncommon:

<h1 class="section">Section Name</h1>
<h1 class="headline">Headline name</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 2</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 3</h1>
[... assorted HTML marked up text ...]
<h1 class="section">Section 2</h1>
<h1 class="headline">Headline 4</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 5</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 6</h1>
[... assorted HTML marked up text ...]

and so on.

What I'd like to end up with is, if possible

<section id="Section Name">
  <headline id="Headline ">
     [...marked up text...]
  </headline id="Headline 2">
  <headline>
     [...marked up text...]
   </headline>
  <headline id="Headline 3">
     [...marked up text...]
   </headline>
</section>

XSLT 2.0 and group-starting-with could do that e.g.

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  version="2.0">

  <xsl:output method="xml" indent="yes" version="1.0"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="body">
    <xsl:copy>
<xsl:for-each-group select="node()" group-starting-with="h1[(_at_)class = 'section']">
        <xsl:if test="self::h1[(_at_)class = 'section']">
          <section id="{.}">
<xsl:for-each-group select="current-group() except ." group-starting-with="h1[(_at_)class = 'headline']">
              <xsl:if test="self::h1[(_at_)class = 'headline']">
                <headline id="{.}">
                  <xsl:apply-templates select="current-group() except ."/>
                </headline>
              </xsl:if>
            </xsl:for-each-group>
          </section>
        </xsl:if>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

will turn

<body>
<h1 class="section">Section Name</h1>
<h1 class="headline">Headline name</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 2</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 3</h1>
[... assorted HTML marked up text ...]
<h1 class="section">Section 2</h1>
<h1 class="headline">Headline 4</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 5</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 6</h1>
[... assorted HTML marked up text ...]
</body>

into

<body>
   <section id="Section Name">
      <headline id="Headline name">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 2">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 3">
[... assorted HTML marked up text ...]
</headline>
   </section>
   <section id="Section 2">
      <headline id="Headline 4">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 5">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 6">
[... assorted HTML marked up text ...]
</headline>
   </section>
</body>


--

        Martin Honnen
        http://msmvps.com/blogs/martin_honnen/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>