xsl-list
[Top] [All Lists]

Re: [xsl] Community Conversion

2016-05-31 10:06:38
It's not as direct as you want, but the DITA For Publishers project
(dita4publishers.org) includes a general Word-to-DITA transformation
framework that makes it relatively easy to generate DITA XML from styled
Word documents.

From the generated DITA you can then generate HTML using the normal DITA
tool chain (the DITA Open Toolkit).

This framework depends on some unique features of DITA but it could be
adapted to generate HTML directly rather than DITA.

The transformation is implemented as a two-phase process:

Phase 1: Generated a simplified form of the Word XML, which I call "simple
word processing format". This captures the essential structure and style
details of the original Word document while eliminating all of the hideous
verbosity of the Office Open markup design.

Phase 2: Transform the simple word processing doc into DITA. This relies
on a separate style-to-tag mapping document that relates Word styles to
DITA structures. This depends on heavily on for-each-group and the code is
a bit gnarly--it grew rather organically and, while it works, I can't
claim it reflects the best engineering approach. If I were to ever rewrite
the code I'm sure I would make it much cleaner and clearer.

This second phase could be replaced with a new HTML-generation phase,
driven either by the existing style-to-tag map or by a new one or just by
some static binding from styles to HTML markup (if such a thing is
possible).

The Phase 1 process is pretty stable--I only have to update it when some
new Word feature requires support from a client.

The code is in GitHub here:

https://github.com/dita4publishers/org.dita4publishers.word2dita

Cheers,

Eliot

----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 5/28/16, 8:39 PM, "adam adam@coko.foundation"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

hi

I'm new to the list. My usual home is at the Collaborative Knowledge
Foundation:
http://coko.foundation/

So, I was poking around looking for any community/co-ordinated attempts
at creating some robust XSL transformations from docx to HTML. I'm aware
of TEI stylesheets and have had a good poke around in github and
elsewhere, but I'm looking at straight docx->html (sans TEI) and the few
stylesheet repos I find are not so well maintained. I am probably
missing some, so any recommendations for a thriving hub of energy around
this particular conversion is would be appreciated.

However, what I'm really looking for is an active community, possibly
with its own list or web based presence where there is a community
effort to improve specific conversion types. Essentially. Im wondering
if this already exists for docx->html or if not, then are their similar
attempts I can learn from?....my inclination is to look for, or set up,
something that had a web based component for testing so that non-XSL
experts could also contribute through manual QA of results etc...

Any thoughts or tips welcomed....

Adam



-- 

---
Adam Hyde
http://www.adamhyde.net/projects


--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>
  • [xsl] Community Conversion, adam adam@coko.foundation
    • Re: [xsl] Community Conversion, Eliot Kimber ekimber(_at_)contrext(_dot_)com <=