On May 26, 2009, at 5:11 PM, Robert Koberg wrote:
Hi,
(gone through this many times.) Usually, I find the easiest thing to
do is open in Open Office and export as XHTML. That way you get the
structure you want and then whittle the rest of the junk away till
you get it to conform to some schema, maybe splitting out content
pieces based off of H1s (we sometimes get whole websites written in
Word).
I might not have been clear - I meant that we use XSL to remove the
unnecessaries. (don't want to get yelled at :) )
Start out with the identity template and any obvious matches. Remove
Ps that only contain whitespace. Remove pretty much all attributes.
Remove many unnecessary SPANs. Doesn't take long: edit the xsl, run
the transform, check validity, rinse, wash, repeat.
best,
-Rob
Another thing we do is just paste the Word content into our web
based editor - Xopus - and it does the work to convert it to the
current XML Schema. Does a really good job, but there is usually
some clean up which is done by the author.
best,
-Rob
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--