xsl-list
[Top] [All Lists]

Word to other XML conversion. [ Re: [xsl] where to look for xsl folk..]

2016-06-20 15:18:15

We have an application that was used to interactively convert Word document 
finding aids into EAD XML.

  https://github.com/uvalib/transmog

and I believe it can be adapted to convert to TEI XML instead. 


The templates here are a set of rules that use regular expressions on the 
headings to guess what XML elements those paragraphs should be assigned to, and 
it looks like it could probably be reconfigured to output TEI instead of EAD.

  https://github.com/uvalib/transmog/tree/master/src/main/resources


The webapp display those guesses and allows you to rearrange or reassign those 
assignments.
So it doesn’t solve the problem of writing XSLT conversion rules, but it does 
help with conversion of documents that may not exactly follow those rules. 

Typically the converted documents still require some manual QA and editing.


— Steve Majewski / UVA Alderman Library





On Jun 20, 2016, at 3:30 PM, G. Ken Holman 
g(_dot_)ken(_dot_)holman(_at_)gmail(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Indeed hard does not mean impossible.  The Inera folks have a strong product 
named eXtyles for going from Word to various JATS derivatives including 
ISOSTS that I am personally interested in:

 http://www.inera.com/resources/extyles-related-technologies

I haven't heard much of any other Word-based products ... but I post this to 
point out that it has been done successfully commercially.

. . . . . . . Ken

At 2016-06-20 18:58 +0000, Wendell Piez wapiez(_at_)wendellpiez(_dot_)com 
wrote:

Hi,

On Mon, Jun 20, 2016 at 10:36 AM, Christopher R. Maden 
crism(_at_)maden(_dot_)org
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
On 06/19/2016 04:17 PM, adam adam@coko.foundation wrote:

We are working with docx files that need to be translated into HTML. The
docx files are chapters of scholarly content that constitute a book. We
need to translate the docx into a tidy HTML version with direct
translation of semantic elements but with the elimination of styles.

There are a few tools to do this kind of thing.  The Public Knowledge
Project is working on integrating them into a pipeline; it's not ready for
prime time *quite* yet, but it's getting there, and the individual
components may be useful to you on their own.  Check out <URL:
https://github.com/pkp/xmlps > for source and more info.

Indeed there are a number of different such initiatives some of them
including XSLT and so on topic. :-)

(In fact didn't Eliot recently mention his thing for a Word -> DITA pathway?)

Whether using XSLT (and on topic) or not -- converting from Word (what
I like to call a 'paintbrush' application) into strong markup is going
to be a hard problem, largely because its boundaries are not in an
obvious place, plus they move. It will always be contested what is in
scope vs what is not, and there will be a tradeoff between generic and
specialized capabilities.

Hard doesn't mean impossible, however, and what would be nice would be
a toolkit that could be adapted for local use....

Cheers, Wendell

--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^



--
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ |
G Ken Holman _ _ _ _ _ _ _ _ _ _ 
mailto:gkholman(_at_)CraneSoftwrights(_dot_)com |
Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>