|
Re: [xsl] Unit tests in XSLT
2018-10-15 16:54:38
Hi Pieter,
I support your assessment of unit tests / TDD in XSLT, at least for the
kind of complex multi-pass upconversion pipelines that we often deal with.
tl;dr
Given the complexity of configurable, multi-pass upconversion pipelines,
I don’t think test-driven development with XSpec is suited to cover
these cases well. An infrastructure at which you just can easily throw
an input file and its normalized, checked output(s) after, say, 50 XSLT
passes will give you much better coverage with significantly less effort.
We rely heavily on integration tests. They are also a bit cumbersome to
set up for each converter configuration, but adding more test input (for
example, docx or IDML) and expected output files is then fairly easy. In
our experience, having 10 diverse input files (whole books or book
chapters) for each production line and having ~30 production lines with
integration tests gives you a good test coverage to spot unwanted side
effects of XSLT changes in the libraries that we use in these
converters. Quite often side effects of changes in some XSLT mode in a
library pop up in unexpected places!
The integration tests are basically: When one or more libraries have
been changed, we convert the test sets of each converter configuration
that uses the libraries, with the most recent versions of the libraries.
Then we normalize the conversion output, for ex., TEI XML output and
EPUB content & metadata files + SVRL output wrapped in some top-level
element, by stripping away generated IDs or file system paths and by
using indented output, and then we do a plain line-wise diff against
reference data. Then we just add the exit statuses of the diff
invocations to see how many tests failed (output differs from expected).
If all tests pass for a converter, the libraries can automatically be
promoted to the most recent commits.
One downside of these black-box integration tests is that they can run
for some time. A typical IDML→intermediate XML→TEI/BITS→HTML→EPUB
conversion for a 500-page book can well run for 4 minutes (including
lots of Schematron validations and patching the error messages into the
HTML rendering), so a test for a converter configuration typically needs
20–60 minutes to complete.
These long turnaround times in particular have led some of my colleagues
to set up XSpec unit tests for some of our individual XSLT libraries:
https://github.com/search?l=XML&p=1&q=org%3Atranspect+xspec&type=Code
Whenever there is a commit to a library, Travis will run the XSpecs.
Setting up XSpec/Travis for a library is a similar effort as setting up
an integration test (we use Jenkins and some bespoke code) for a
converter configuration (a converter configuration consists of some
libraries with fixed svn external or git submodule revisions and some
conversion-specific XSLT and XProc glue/override code). Say you need
something between 1 and 4 hours to set up the testing infrastructure,
but then you haven’t set up a single XSpec test yet.
On the other hand, adding one more test file and its normalized
reference output is a matter of minutes for integration tests.
Some colleagues didn’t feel too comfortable with my integration-only
approach to testing, on the one hand because you don’t know the coverage
of your tests, on the other hand because the tests run for such a long
time. This is really the downside of the ease with which you can create
new tests. If you see a customer document that challenges your
list-nesting templates, you are tempted to add the whole document to the
test set, rather than stripping it down to a minimal repro. (Often
stripping the phenomenon down to a minimal test is not so easy because
the error will go away if you remove the “wrong” parts of the input file.)
The two-layered approach, relatively fast-running XSpec tests for
individual libraries and long-running integration tests for whole
converter configurations, has reportedly saved the colleagues who use
XSpec some time because they spot unwanted side effects more quickly.
However, they also admit that it would take person-weeks for each of our
~20 more important libraries to write XSpec tests that would give
“enough” coverage.
Because the input is so stripped-down for each test case, I doubt that
these unit tests will cover the common case when the input is just
unexpected. And this doesn’t mean unexpected customer input. It can
happen too easily if the maintainer of one upstream library (for ex.,
IDML to flat intermediate format) changes some markup conventions. Then,
in principle, all unit tests of subsequent conversion steps need to be
updated, but this is often beyond control or even knowledge of the
library maintainer.
This is made worse by the fact that each converter configuration can
override the XSLT of most subsequent steps (hierarchizing the flat XML,
converting it to TEI or BITS, converting it to HTML, etc.), so there is
no single XSpec that will cover it all.
In addition, unit tests are only for a single pass in a single XSLT
mode. Our pipelines typically consist of something in the order of
magnitude of 50 XSLT passes over a document. And more than half of them
can be customized. This is again a trade-off. We prefer having many
modes where each mode doesn’t do too much (two modes hierarchize
sections by heading level using for-each group and postprocessing, 4
modes detect list nesting by the amount of indentation, etc.) over a
convoluted, difficult-to-debug pipeline even if that pipeline runs in 2
rather than 4 minutes.
So I’m not opposed to XSpec tests in our conversion pipelines, but for
me, this is only an optional developer-time saving optimization.
Our development, like XSLT in general, is very much input-driven rather
then (unit-) test-driven. We see input, we configure a converter, we see
more input, we tweak the converter. The conversion result for the first
input will inevitably become broken in some way. We set up integration
tests with expected outputs. We receive more input. We tweak the
conversion again. We see more interesting errors. We add test files. We
will also frequently update the expected outputs because there will be
occasional improvements in the libraries. But it is always input-first,
not test-first.
One colleague has found a middle ground where he used our Jenkins
integration test infrastructure to create a test set with lots of docx
and IDML files containing *lists* and only check the list numbering and
nesting part of our upconversion. This is an integration test that
focuses on a single aspect, albeit not a single (but rather 4) XSLT
pass. But this is not really test-driven development where you write
tests before the code. It will just check whether a later-stage tweak to
existing code will likely break existing production pipelines.
Gerrit
On 15.10.2018 21:41, Pieter Masereeuw pieter(_at_)masereeuw(_dot_)nl wrote:
Talking about debugging, I recently attended a webinar by Oxygen about
unit tests for XSLT (and Schematron). After watching it, I felt guilty
that my XSLT development practice is still not test-driven. Despite of
what was said in the webinar, it seems too much trouble and the thing
that most often requires modification of stylesheets is changes or
hitherto unforeseen constructs in the format of the input XML.
It would be interesting to know if there are readers on this list who
actually make use of units test during XSLT development and what their
experiences are.
Pieter Masereeuw
On 10/15/2018 07:28 PM, Eliot Kimber ekimber(_at_)contrext(_dot_)com wrote:
Yes, in general you want debug messages to be turned off, which is why the doDebug
parameter default is "false()" in my code.
I didn't think of use-when="$DEBUG or true()"--that would work for a lot of my
cases but it doesn't handle the case where I want to turn on debugging for all the
templates that will get called in the course of handing some specific input.
So that suggests that my dynamic approach is what I need generally.
Once the code is in place and working then it would be easier to set up a set
of debugging control variables that reflect different cases or code paths I
know I might need to debug in the future but during development that doesn't
really work because of course the code is in flux you don' t necessarily know
what will be of interest and what won't.
It might work to have per-module static debug controls. I've moved generally to
using more smaller modules, usually one per distinct mode or set of related
modes, and that would make it more natural to have global debugging for those
modes. I'll have to think about that more.
In practice my debugging pattern isn't a burden--it's something I do whenever I
set up a new template or add an apply-templates or next-match or call-template
but it's always felt like there should be a simpler way.
Cheers,
Eliot
--
Eliot Kimber
http://contrext.com
On 10/15/18, 12:11 PM, "Michael Kay mike(_at_)saxonica(_dot_)com"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
With static variables you can of course have multiple switches but they will be
statically scoped rather than dynamically scoped. You could use multiple variables or you
could use flags within a single variable (use-when="contains($DEBUG_FLAGS,
'g')").
I have to confess I'm not usually that organized. I tend to have a single variable $DEBUG which is false, and then switch on individual debug lines using use-when="$DEBUG or true()". I tend to find that debug statements are rarely useful once you've solved the bug that they were invented for; except in rare cases where you persistently have problems with some particular intermediate result passed across a key interface in your application - in which case there may be better approaches than xsl:message to monitoring what's passed across that boundary.
But I wouldn't recommend anyone to be as disorganised as me.
Michael Kay
Saxonica
> On 15 Oct 2018, at 17:05, Eliot Kimber ekimber(_at_)contrext(_dot_)com <xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>
> I was just about to post about this.
>
> In my XSLT 2 code I have historically used this pattern:
>
> <xsl:template match="foo">
> <xsl:param name="doDebug" as="xs:boolean" tunnel="yes"
select="false()"/>
>
> <xsl:if test="$doDebug">
> <xsl:message>+ [DEBUG] Handling <xsl:value-of name="concat(name(..), '/',
name(.))"/>...</xsl:message>
>
> <xsl:apply-templates>
> <xsl:with-param name="doDebug" as="xsl:boolean" tunnel="yes"
select="$doDebug"/>
> </xsl:apply-templates>
> </xsl:template>
>
> This allows me to selectively turn debugging on and off in specific
parts of the code but does require this somewhat heavy weight code.
>
> With @use-when, can I get the same level of local control?
>
> That is, with the above, I can add:
>
> <xsl:variable name="doDebug" as="xs:Boolean" select="true()"/>
>
> In any block to turn debugging on just there.
>
> If I understand the implications of static variables allowed in
@use-when, the debugging switch is globally all-or-nothing, or at least global
within a given package.
>
> Is that correct?
>
> If that is correct, is there a better way to do the selective,
dynamically-controlled debug messaging shown above?
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
>
> On 10/15/18, 9:08 AM, "Michael Kay mike(_at_)saxonica(_dot_)com"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>
> These days you can do
>
> <xsl:message use-when="$DEBUG" ....>
>
> with $DEBUG defined as a static parameter.
>
> <xsl:param name="DEBUG" as="xs:boolean" static="true"
select="false()"/>
>
> No need for the run-time check with xsl:if.
>
> You can also use xsl:assert to define assertions. In Saxon, assertion
checking can be enabled from the command line using -ea.
>
> Michael Kay
> Saxonica
>
>> On 15 Oct 2018, at 14:54, Dave Pawson dave(_dot_)pawson(_at_)gmail(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>>
>> MIght even surround it with
>> <xsl:if test="$debug">
>>
>> To ease insertion / removal when testing?
>>
>> HTH
>> On Mon, 15 Oct 2018 at 14:31, Wendell Piez
wapiez(_at_)wendellpiez(_dot_)com
>> <xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>>>
>>> Eliot writes:
>>>
>>>> I also depend heavily on using messages to test my assumptions.
>>>
>>>> For example, I might do something like:
>>>
>>>> <xsl:message>+ [DEBUG] jpeg_few={$jpeg_few => string-join(',
')}</xsl:message>
>>>> <xsl:message>+ [DEBUG] jpeg_many={$jpeg_many => string-join(',
')}</xsl:message>
>>>
>>> This is a key technique when developing XSLT. The language is designed
>>> to "fail gracefully" most of the time -- which puts the burden on the
>>> programmer to ensure things don't fail catastrophically. :-)
>>>
>>> Cheers, Wendell
>>>
>>> On Sun, Oct 14, 2018 at 7:10 PM Eliot Kimber
ekimber(_at_)contrext(_dot_)com
>>> <xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>>>>
>>>> Looking at the XPath 3 Functions and Operators specification and searching on
"intersect" (hoping to also find "disjoint") I find this discussion:
>>>>
>>>> D.4.2.3 eg:value-except
>>>> eg:value-except( $arg1 as xs:anyAtomicType*,
>>>> $arg2 as xs:anyAtomicType*) as xs:anyAtomicType*
>>>> This function returns a sequence containing all the distinct items
that appear in $arg1 but not in $arg2, in an arbitrary order.
>>>>
>>>> XSLT implementation
>>>>
>>>> <xsl:function name="eg:value-except" as="xs:anyAtomicType*">
>>>> <xsl:param name="arg1" as="xs:anyAtomicType*"/>
>>>> <xsl:param name="arg2" as="xs:anyAtomicType*"/>
>>>> <xsl:sequence
>>>> select="fn:distinct-values($arg1[not(.=$arg2)])"/>
>>>> </xsl:function>Which is in
https://www.w3.org/TR/xpath-functions-31/#other-functions (Appendix D).
>>>>
>>>> So basically
>>>>
>>>> distinct-values($jpeg_few[not(. = $jpeg_many)]
>>>>
>>>> Should give you the answer you seek.
>>>>
>>>> I agree with Mike that being obsessive about putting data types on
all variables and function return values (and templates when the templates should return
atomic types or specific element types) will help a lot.
>>>>
>>>> If your code is working without types but failing with them it means your code
is "working" but probably not for the reasons you think.
>>>>
>>>> Working carefully through the stages of the expressions by setting
each intermediate result into variable will help a lot.
>>>>
>>>> I also depend heavily on using messages to test my assumptions.
>>>>
>>>> For example, I might do something like:
>>>>
>>>> <xsl:message>+ [DEBUG] jpeg_few={$jpeg_few => string-join(',
')}</xsl:message>
>>>> <xsl:message>+ [DEBUG] jpeg_many={$jpeg_many => string-join(',
')}</xsl:message>
>>>>
>>>> Or if those lists are very long, use count() or get the first n items
or whatever to make it clear that you're working with the values you think you are.
>>>>
>>>> Also, remember that <xsl:value-of> ({} in string result contexts) is
different from <xsl:sequence>, which returns the actual value, not a string representation.
>>>>
>>>> For example, given a variable that is an attribute node, value-of will return string value of
the attribute but xsl:sequence will return the attribute node and Saxon will serialize it as <attribute
name="foo" value="bar"> (or something similar to that.
>>>>
>>>> It's easy to accidently create a sequence of attributes when what you
wanted was a sequence of strings (or visa versa) and using xsl:value-of can obscure that
mistake.
>>>>
>>>> I've also started using the XQuery-required explicating casting of
values even though XSLT usually lets you get away with implicit casting, because it makes
it clearer to me what my intent was (and makes it easier to copy XPath expressions into
XQuery, if that's something you need to do).
>>>>
>>>> Cheers,
>>>>
>>>> Eliot
>>>> --
>>>> Eliot Kimber
>>>> http://contrext.com
>>>>
>>>>
>>>> On 10/14/18, 3:53 PM, "Dave Lang emaildavelang(_at_)gmail(_dot_)com"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>>>>
>>>>> That error can only come from an expression that calls tokenize().
It's therefore clearly not your declaration of jpgs_in_xml_not_directories that's at fault.
>>>>
>>>> Fair enough - but when I run the transformation without that
declaration
>>>> everything works fine. Is there something I can do to the variables
that
>>>> are included in it to make the declaration work?
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Wendell Piez | http://www.wendellpiez.com
>>> XML | XSLT | electronic publishing
>>> Eat Your Vegetables
>>> _____oo_________o_o___ooooo____ooooooo_^
>>>
>>
>>
>>
>> --
>> Dave Pawson
>> XSLT XSL-FO FAQ.
>> Docbook FAQ.
>>
>
>
>
>
--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930
Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
|
|