Re: [xsl] Top 10 XSLT patterns

On 4/3/14 11:33 AM, Abel Braaksma (Exselt) wrote:

It will likely be non-trivial to compile such list without a good query
to search through existing stylesheets and known programming challenges.
But from your experience, what patterns do you encounter most often?

Here are some very specific concrete examples which have come up a lotfor us when processing large texts. I don't see how they map to thepatterns you all are discussing, but they are probably combinations ofthem in some way?

Something we've had to implement multiple times in various combinations(XSLT 1, 2, XQuery, JDOM/Java) is what I call the "proem" extractor:pull out the first N characters (or words) from a document, maintainingall of the ancestral markup. A more elaborate variant is to extract anintermediate section that could be defined in various ways (characters N- N+100, everything between two <mark> elements, etc). I don't knowwhat to call that -- tree surgery? Typically the idea is to generatedocument summaries, hit highlighting, or annotated passages.

Another major problem for us has been reference resolution: in which aset of documents is marked up with cross references to other documents,or sub-documents, and the problem is to copy some part of the referenceddocument into the reference (as a performance optimization, so itdoesn't have to be looked up later). The basic idea is simple enough,but is complicated by very large numbers of large documents with largenumbers of references. Another complication is that the document corpusmay be constantly evolving; as new documents are introduced, bothoutbound *and inbound* references must be resolved.

There are lots of variants of this reference resolution problem: simplelinks, abbreviation expansion, footnote inlining. Footnotes areespecially challenging since they may contain further references toadditional footnotes, so the expansion is recursive (and inevitably,circular). References might be to non-XML documents and trigger non-XMLprocessing: specifically for image files, we would typically want tostore a reference to the image file indicating whether it exists (andwhere, if we had to hunt for it), its size, format, etc.

A very common feature of all of our pipelines is chunking. Thecanonical example is pulling all the chapters out of a book document andcreating standalone chapter documents plus a skeletal book document thatserves as cover page and table of contents. We usually want to preservesome ancestral markup in the "chapters", and since we are generating newdocuments, we need to keep track of references to these documents forthe TOC, for next/previous navigation links and fortranslating/resolving other cross-references that were intra-document,but have become inter-document.

Another dumb thing we do all the time is run a list of XPaths over adocument and save the results into a Java object for easy access in ourapplication framework. This is just a simplified version of marshalling(or unmarshalling?) to cross the language barrier (we call it xmlmapping). We also use XSLT to render these XML documents as HTML, butwhen we need (usually atomic) values to be handled by our Javaapplication layer, we want an easy way to extract them from the XML. Forlarge numbers of paths, I think we would be better off doing this with asingle generated XSLT (so we don't have to traverse the document onceper path), but currently we don't do that.


I hope that's useful.

-Mike

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--