xsl-list
[Top] [All Lists]

Re: XSL pattern needed for begin/end elements

2004-07-07 10:01:43
Hi Tracy,

This is a hard problem, and one for which XSLT was not designed.

Nonetheless, there is enough experience around that some guidance is possible.

At 12:14 PM 7/7/2004, you wrote:
I'm looking for an XSL pattern to solve the problem of going from XML
that has separate begin and end elements to one that does not.

In other words, the "separate begin and end elements" are merely markers for something not-yet-an-element (actually a sequence of nodes), which you want to turn into an element.

In other words, this is an up-conversion whereby you want to "wrap" a set of nodes in another (new) node, depending on their relations to other nodes (their "markers").

Please, please note that I do not control either the source or target
XML formats.  If I did, this would be much easier.

Or not -- the problem they're trying to solve arguably is not well handled by XML. Caveat: depending on how the problem is being scoped. It could be, as you imply, that a much simpler solution is possible, if the problem is scoped more narrowly.

Scoped broadly, this is the problem of "multiple concurrent hierarchies" (short syntax: "overlap"), which is a fairly hot research area: see the preliminary program for the Extreme conference in Montreal, at http://www.mulberrytech.com/Extreme/Program.html -- especially Wednesday, "Overlap Day".

Source XML snip:

<doc>
  <hyperlink_begin id=3D"111" end=3D"222">
    <locator_url protocol=3D"http" host_name=3D"www.sf.net"/>
  </hyperlink_begin>
  <text_run>Click</text_run>
  <text_run emphasis=3D"bold">here.</text_run>
  <hyperlink_end id=3D"222" begin=3D"111"/>
</doc>

Target XML example:

<cod>
  <HyperLink xlink:href=3D"http://www.sf.net";>
    Click <b>here.</b>
  </HyperLink>
</cod>

In my case I can assume that associated begin and end hyperlink tags
will occur as siblings -- though generally this is not the case and in
fact, this is the reason the begin and end tags are unique elements.

If you can bank on this assumption, it makes it possible to address this using "positional grouping". There are two main approaches to this in XSLT 1.0 (covered in the FAQ); but neither are as clean and simple as an XSLT 2.0 group-by construct, which you have available in Saxon 8.

(If Jeni isn't busy with mini-Jeni at the moment, maybe she'll offer this one, or Mike or someone else will. Having only poked at it, I can say only that it's somewhat trickier than the general case: you can't use the "group-starting-with" grouping criterion because your end-markers are a different element type. ;-)

If you can't assume these are siblings, then you're in uncharted territory ("Here be Dragons"). You could pull XSLT into service as a tag-writing application (requires that you invoke a serializer to implement the conversion, and use the dreaded "disable-output-escaping" feature to write tags) -- but this can't guarantee well-formed output. In fact, if you have to do this (if you can't use a grouping technique) you can more or less assume your output will be XML only by accident.

Cheers,
Wendell


======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================