xsl-list
[Top] [All Lists]

Re: [xsl] Reasons for using XSLT to validate XML instances?

2015-06-23 16:24:42
Hi,

I rather see the variation between different schema and validation
technologies as not necessarily (or not necessarily even, now)
functional variations (even given large overlaps), so much as their
differing definitions of what is meant by "validation", and the types
of rules and constraints systems wish to manage -- both internally,
and at boundary lines -- through external checks of this kind.

This being the case, inevitably all the named approaches to
'validation' will have their strengths. This includes straight-up XSLT
or the metalanguage of your own design and implementation.

Grammar-based schema languages such as DTD and RNG (and XSD's complex
types) have historical roots in typesetting systems because of their
particular and peculiar strengths for dealing with mixed content,
arbitrary recursive structures, and implicit structures, all of which
are challenging for regulated, automated information processing
systems and all of which are characteristic of print media (which
these systems were designed to support). Remember this was before
there was a unified query syntax for marked-up content, which could
also help discern the models hidden in the tag thickets. No XPath!
256KB RAM available for processing! Markup indeed could not be parsed
at all without prior knowledge shared formally between parties (as
DTDs and/or other descriptions and configurations), and grammars were
and are a great way to do this -- concise, elegant, powerful,
expressive for the sorts of structures we see in documents.

("Validate" back then meant something different: it meant "available
for processing" since without a grammar, a document could not even be
parsed! But XML's well-formedness constraints changed the game, to the
point where now "validation" more or less amounts to a privileged
status check of some sort, architecturally not much different from any
other query.)

Yet even now, getting Schematron to validate a content model seems
like a fool's errand (or at the very best a parlor game at certain
conferences I can think of), if the content model is something like

(sec-meta?, ((label, title?) | title), (address | alternatives | array
| boxed-text | chem-struct-wrap | code | fig | fig-group | graphic |
media | preformat | supplementary-material | table-wrap |
table-wrap-group | disp-formula | disp-formula-group | def-list | list
| tex-math | mml:math | p | related-article | related-object |
disp-quote | speech | statement | verse-group)*, (sec)*, (fn-group |
glossary | ref-list)*)

http://jats.nlm.nih.gov/publishing/tag-library/1.1d3/element/sec.html

Especially given how easy it is to describe even a complex content
model in DTD or RNG syntax.

It remains a question whether and which kinds of systems can best
exploit grammar-based validation, of course -- especially once we have
XSLT streaming, and can do so much else. That can be argued, although
I can offer from experience that publishing systems (the kind I know
best) are among the kinds of systems for which a tag grammar of some
kind is more or less indispensable. Data management without it is just
too hard -- it's hard enough even if you have a bad or weak grammar,
not well fitted to the data (most HTML comes to mind).

Validating against grammars in XPath/XSLT should, I hope, remain a
research area, along with other strategies. An RNG processor in XSLT,
why not? Or ... how about a CSS-based validation technology, in which
assertions and queries were bound to elements via CSS selectors?

So I expect many systems at least will never settle on a single
approach to validation, but will combine technologies, sometimes using
all three (schema, Schematron, XSLT) as well as even more creative
approaches to document management, contracting and QA ...

Cheers, Wendell


On Sat, Jun 20, 2015 at 12:11 PM, Mukul Gandhi 
gandhi(_dot_)mukul(_at_)gmail(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hello Roger,
   I would like to comment about the suitability of using XSLT to validate
XML instances. I agree to your other points that, XML Schema and Schematron
can validate XML instances. I also know that RelaxNG is also a suitable
language, to validate XML instances.

XSLT is a language to transform XML instances or text input documents, to
other XML instances / text. XSLT is able to import XML Schemas, which I
believe are XSDs that allow schema-aware _XSLT transformations_. I would
like to then imply that, XSLT is not a native XML validation facility. i.e
XSLT (2.0 and plus) can use a Schema document(s), and then use it / them to
validate input XML documents before supplying the XSD annotated input XML
documents to an XSLT transform.

I hope these descriptions make my points clear pls.


On 19 June 2015 at 21:19, Costello, Roger L. costello(_at_)mitre(_dot_)org
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi Folks,

XML Schema can validate XML instances.

Schematron can validate XML instances.

Is there ever a situation where it would be preferable to use XSLT to
validate XML instances?

/Roger




--
Regards,
Mukul Gandhi
XSL-List info and archive
EasyUnsubscribe (by email)



-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>