xsl-list
[Top] [All Lists]

RE: Using XSLT to add markup to a document

2003-07-03 14:46:38
Jim,

Charles is correct (this is an example of "up-translation"), *but*....

This problem has been solved with recursive string-crunching techniques. I don't have time to do the exposition now, but I think there may be something in EXSLT that does this. It's not pretty, or efficient, but it works. (I did this way back when cutting my teeth on the language, making a concordancer out of XSLT.)

Generally the approach is with a template taking a string as a parameter; test the string with the contains() function; if it doesn't pass copy it to output. If it does, use string-before and copy the front part to output, then make your node, then pass the remainder as a parameter back to the template recursively.

Charles is also correct to observe Dimitre has a sophisticated way to do this, optimized for efficiency -- so be sure to check out FXSL as well.

I also agree that there are other tools that may be far better at this, especially if you have large documents and this is the only change to make.

Cheers,
Wendell

At 04:52 PM 7/3/2003, Charles wrote:
As Ed used to say to Johnny, "You are correct, sir!". XSLT 1.0 is good at working with elements and attributes and so forth. What you want to do is create an element where none existed before and there are no XML semantic cues to identify them. You have two choices, upgrade to the latest Saxon version which supports XSLT 2.0 features like regular expression matching, or go the non-XSL route and pre-process your XML document with Perl (or another language that supports regular expression matching) and add the new markup before you run your XSLT transform. Having said that, I expect that Dimitre Novachev will now tell you that he has just the thing you need in his FXSL bag o' tricks.
--
Charles Knell
cknell(_at_)onebox(_dot_)com - email



-----Original Message-----
From:     Jim Melton <jim(_dot_)melton(_at_)acm(_dot_)org>
Sent:     Thu, 03 Jul 2003 14:27:36 -0600
To:       xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Cc:       jim(_dot_)melton(_at_)acm(_dot_)org
Subject:  [xsl] Using XSLT to add markup to a document

Gentlepeople,

I'm struggling with a problem that I fear isn't easily solved with XSLT,
but there are many experts on this list who might be able to help.  The
brief summary of my problem is that I want to find certain words that
appear in paragraphs throughout a very large (XML) document and mark up
those words without making any other changes to my document.

For example, consider a document with the following fragment:

<para>
This is a sample document that deals with markup of <emph>text</emph>.
</para>
<para>
When one applies <emph>markup</emph> to a large document, one is faced with
a <def>time-consuming</def> effort.
</para>

If one of the words to which I wish to apply markup is "markup" and another
is "document", then I would want the result to be something like this:

<para>
This is a sample <special>document</special> that deals with
<special>markup</special> of <emph>text</emph>.
</para>
<para>
When one applies <emph><special>markup</special></emph> to a large
<special>document</special>, one is faced with a <def>time-consuming</def>
effort.
</para>

As you see from this example, I want to *add* markup to the words I have
found where they appear in my result tree, but copy everything else in my
document to the output tree unchanged.

I tend to use Saxon (currently using 6.5.2) as my primary XSLT engine, but
I also have Microsoft's MSXML 4.0 (and could undoubtedly find others if
required to do so).


======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list