xsl-list
[Top] [All Lists]

Re: [xsl] Nesting a flat XML structure

2018-10-29 16:52:29
Peter,
Going a little off topic, but the concept is relatively simple. Many writers 
don't make the best use of their word processors. Maybe lists are manually 
indented with bullets inserted from a character palette. Titles may be 'Normal' 
text with character overrides for font size and weight. You get the idea? 
Careful analysis of many documents showed that there are between eight and ten 
properties that have the most effect on the output for character styles and 
paragraph styles. This is presented as an override code in a format that is 
very compact but also possible for anyone to understand. The combination of any 
correctly defined style name plus its override code gives us a key that can be 
used for mapping to elements in the output. 

This works well when there is some inherent logic to the implied structure of 
the source document. Less so when no regard has been given to sensible style 
use.
Of course you are correct. When styles have been rigorously applied the results 
can be very good too. In those (rare) cases this method still catches the 
occasional accidental override.

~Ian 

-----Original Message-----
From: Peter Flynn peter(_at_)silmaril(_dot_)ie 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> 
Sent: 29 October 2018 21:14
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Nesting a flat XML structure

On 29/10/18 21:04, ian(_dot_)proudfoot(_at_)itp-x(_dot_)co(_dot_)uk wrote:
Agreed Wendell and Graydon. I am already doing multiple passes to get 
the content in a suitable state to do the nesting part. I find that 
most word processed text is in a poor state for easy conversion to 
good XML that is valid to a specific schema.

Microsoft's excellent marketing has successfully persuaded this planet that 
"looking pretty" is the same thing as "being right".

When based simply on paragraph and character style names the end 
result is often unusable.

IFF the styles are applied rigorously and in conformance with a known 
stylesheet, it is actually possible to get fairly good transformations to (eg) 
JATS, DocBook, TEI, etc.

So I use temporary attributes that encode the important stylistic 
overrides - capturing what the author was trying to achieve. I have 
been very pleased with the results.

I'm very intrigued by this: where do you get the author's intentions from? 
Traces they leave in the markup (eg italics or bold)?

///Peter
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>