xsl-list
[Top] [All Lists]

Re: [xsl] Does the new structure include the same text content?

2021-01-22 09:58:59
Hi Gerrit,
Good to know that I may be on the right track with the normalized text diff.
It would be almost impossible to go back to the original SGML structure from 
the XML. The main difficulty is that a lot of the structure in the SGML uses 
inclusions to allow tables and figures in almost any location. That SGML 
feature was always a recipe for untidy documents!

Ian

-----Original Message-----
From: Imsieke, Gerrit, le-tex gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> 
Sent: 22 January 2021 11:45
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Does the new structure include the same text content?

Hi Ian,

diffing normalized text output is a good approach in my experience. 
However, if the 4.1 structures differ significantly from 1.7 as you say, it 
might be a good idea to transform the 4.1 output back to 1.7 prior to the diff. 
Or maybe not "transform it back to match the input exactly", but only to such a 
degree that the text files will be the same if no content was lost or 
duplicated.

Gerrit

On 22.01.2021 12:28, ian(_dot_)proudfoot(_at_)itp-x(_dot_)co(_dot_)uk wrote:
Hi everyone,

I am working on a project to convert several thousand SGML files 
(S1000D
1.7) into a more recent XML version (S1000D 4.1). My finished XSLT 
style sheet does the job that is expected.  However during the 
development I did run into a problem where an error in the stylesheet 
allowed the output to pass schema validation but by omitting some 
content! For me that’s very bad news and I was lucky to notice it.  
Ultimately the final output will be verified by the subject matter 
experts, but I really don’t want to give them any reason to doubt the 
reliability of the conversion.

This got me thinking about ways to verify the output text content 
against the input despite significantly different structure. Is there 
an established way to do that? If so what is it called and how well 
does it work?

Perhaps it’s something that I should build into the XSLT as it is 
written? Or perhaps it could be run as a post process batch comparison 
operation?

My initial thought is to output normalized text from input and output 
and compare the resulting text files…

I’ve searched the archives, but I probably don’t know the correct 
terminology to get any useful results…

Thanks in advance for all responses.

Ian

Ian Proudfoot

Bembridge

Isle of Wight

United Kingdom
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--


<Prev in Thread] Current Thread [Next in Thread>