Peter,
Going a little off topic, but the concept is relatively simple. Many writers
don't make the best use of their word processors. Maybe lists are manually
indented with bullets inserted from a character palette. Titles may be 'Normal'
text with character overrides for font size and weight. You get the idea?
Careful analysis of many documents showed that there are between eight and ten
properties that have the most effect on the output for character styles and
paragraph styles. This is presented as an override code in a format that is
very compact but also possible for anyone to understand. The combination of any
correctly defined style name plus its override code gives us a key that can be
used for mapping to elements in the output.
This works well when there is some inherent logic to the implied structure of
the source document. Less so when no regard has been given to sensible style
use.
Of course you are correct. When styles have been rigorously applied the results
can be very good too. In those (rare) cases this method still catches the
occasional accidental override.
~Ian
-----Original Message-----
From: Peter Flynn peter(_at_)silmaril(_dot_)ie
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: 29 October 2018 21:14
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Nesting a flat XML structure
On 29/10/18 21:04, ian(_dot_)proudfoot(_at_)itp-x(_dot_)co(_dot_)uk wrote:
Agreed Wendell and Graydon. I am already doing multiple passes to get
the content in a suitable state to do the nesting part. I find that
most word processed text is in a poor state for easy conversion to
good XML that is valid to a specific schema.
Microsoft's excellent marketing has successfully persuaded this planet that
"looking pretty" is the same thing as "being right".
When based simply on paragraph and character style names the end
result is often unusable.
IFF the styles are applied rigorously and in conformance with a known
stylesheet, it is actually possible to get fairly good transformations to (eg)
JATS, DocBook, TEI, etc.
So I use temporary attributes that encode the important stylistic
overrides - capturing what the author was trying to achieve. I have
been very pleased with the results.
I'm very intrigued by this: where do you get the author's intentions from?
Traces they leave in the markup (eg italics or bold)?
///Peter
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--