xsl-list
[Top] [All Lists]

RE: [xsl] Transform HTML to XML using XSL

2007-01-19 04:42:05

You first step is to run the input through the tidy utility to turn it into
well-formed XML (preferably XHTML). At the moment it won't parse as XML
because of all the entities. Or you could use the TagSoup parser to provide
the input to the transformation, which will do this conversion for you
"inline".

By looking at your input and output I can discern a few rules, which one can
write as template rules, for example

<xsl:template match="FONT[(_at_)size='4']">
  <title><xsl:apply-templates/></title>
</xsl:template>

But in fact, one of your desired titles has <label> and <b> elements within
it, and I've no idea what it is in the input that causes these to be
generated. So I would suggest you proceed iteratively, adding rules like the
above incrementally to get closer to the output that's needed. If you're
converting a whole batch of documents, you should check the rules work on a
reasonable sample of them. Every time you don't get quite the output you
want, see what clues there are in the input to enable you to refine the
rules.

You'll want to start with a stylesheet that copies things unconditionally:

<xsl:template match="*">
  <xsl:copy>
  <xsl:copy-of select="@*"/>
  <xsl:apply-templates/>
</xsl:template>

Then add rules for specific elements, or element patterns, that vary the
processing for those elements.

It might be that you hit some structural issues, for example where you want
to create an output element that corresponds to a consecutive sequence of
input elements. This "positional grouping problem" often arises in
up-conversion exercises like this one. There are well-known solutions - and
it's much easier in XSLT 2.0. When you get to that point, come back to the
list and identify the specific problem that's blocking you, taking care to
separate it from all the noise that surrounds it.

I know communication in a foreign language can be difficult, but asking "How
start and close in chapter and section tag" isn't going to get an answer.
Starting and closing tags is what you do all the time in XSLT (though that's
not actually the correct terminology). We need to know what the particular
problem is in this case. 

Michael Kay
http://www.saxonica.com/
  
 

-----Original Message-----
From: Byomokesh [mailto:bkesh(_at_)eztechgroup(_dot_)net] 
Sent: 19 January 2007 07:16
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Transform HTML to XML using XSL

Hi All,

HTML File
=========
<HTML>
<BODY>
<P align="center"><FONT face="Arial" size="2"><FONT 
size="4">Pr&oacute;logo<BR/></FONT><FONT size="5"
color="#FF0000"><I>Comien&ccedil;a Cath&oacute;lica Magestad 
delinvict&iacute;ssimo</I> semper</FONT> Emperador de 
Roma.</FONT></P> <P align="center"><FONT 
size="4">Argumento<BR/></FONT><FONT size="5"
color="#FF0000"><I>S&iacute;guese el Argumento 
del</I></FONT></P> <P align="center"><FONT 
size="4">Cap&iacute;tulo I<BR/></FONT><FONT size="5" 
color="#FF0000"><I>Marco Aurelio Emperador.</I></FONT></P> 
</BODY> </HTML>

I Want Output
=============
<document>
<chapter id="FM01"><title>Front Matter</title> <level 
id="pref01"><title>Pr&oacute;logo</title>
<para><i>Comien&ccedil;a Cath&oacute;lica Magestad 
delinvict&iacute;ssimo</i> semper Emperador de Roma.</para>
<-- Some para continue -->
</level>
<level id="pref02"><title>Argumento</title>
<para>S&iacute;guese el Argumento del</para> </level> 
</chapter> <chapter 
id="Ch01"><title><label><b>Cap&iacute;tulo I</b></label> 
<b>Marco Aurelio Emperador.</b></title>
<!-- then para continue and again chapter start --> 
</chapter> </document>

---------------------

Same tag but i need different condition to xml output.

1. How start and close in chapter and section tag.
2. <BR/> -- tag some cases inline text and some cases need 
para taging in base of XML output.

Please anyone help....

Thanks and Regards
Byomokesh




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>