In general I think the one-pass solution is often more complicated and
runs the risk of not being extensible when the problem "evolves".
In general maybe, but not in this specific case...
I wouldn't offer this solution if it wasn't obviously much simpler than the
offered 3.0 one.
I would say to everyone: Stick to the KISS principle and believe your eyes
(and timings) :)
--
Cheers,
Dimitre Novatchev
On Sat, Aug 15, 2020 at 2:16 AM Michael Kay mike(_at_)saxonica(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
This problem comes up from time to time, and it's not easy.
There seem to be three general approaches:
(a) turn the punctuation into markup (e.g. turn ":" into <colon/>), then
do the manipulation on a tree of nodes
(b) turn the markup into punctuation, then do the manipulation on the
resulting text.
(c) do it all in one pass
I see that Graydon's solution uses serialize() and parse-xml(), so that's
a modern approach to doing (b); while Dimitre's solution does (c). In
general I think the one-pass solution is often more complicated and runs
the risk of not being extensible when the problem "evolves".
One of the things that can cause the problem to "evolve" is error
handling: dealing with situations where the input isn't quite as simple as
in your example. For example, multiple colons, no colons, colons that are
there for a different purpose, etc,. You haven't included any such cases in
your requirements statement.
If we ignore error handling, this example of the problem is simpler than
some because the ":" is always going to be in an immediate child text node;
we've seen other examples (like splitting a table) where we need to look
for conditions much deeper in the structure. This is probably what makes a
one-pass solution feasible in this case.
Intuitively, my feeling is that (a) is the most rigorous approach, the one
that is least likely to fail because of unanticipated input conditions. For
example, Graydon's solution fails if the input contains tags with
upper-case names, or if it contains comments with a colon in the text.
Michael Kay
Saxonica
On 15 Aug 2020, at 03:16, Wolfhart Totschnig
wolfhart(_dot_)totschnig(_at_)mail(_dot_)udp(_dot_)cl
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
wrote:
Dear list,
I would like to ask for your help with the following mixed-content
problem. I am receiving, from an external source, data in the following
form:
<title>THE TITLE OF THE BOOK WITH SOME <i>ITALICS</i> AND SOME MORE WORDS:
THE SUBTITLE OF THE BOOK WITH SOME <i>ITALICS</i></title>
What I would like to do is
1) separate the title from the subtitle (i.e., divide the data at the
colon) and put each in a separate element node;
2) all the while maintaining the <i> markup;
3) and perform certain string manipulations on all of the text nodes; for
the purposes of this post, I will use the example of converting upper-case
to lower-case.
So the desired output is the following:
<title>the title of the book with some <i>italics</i> and some more
words</title>
<subtitle>the subtitle of the book with some <i>italics</i></subtitle>
How can this be done?
I know that I can perform string manipulations while maintaining the <i>
markup with templates, i.e., <xsl:template match="text()"/> and
<xsl:template match="i"/>. But in this case I do not know how to divide the
data at the colon. And I know that I can divide the data at the colon with
<xsl:value-of select="substring-before(.,': ')"/>, but then I loose the <i>
markup. So I am at a loss.
Thanks in advance for your help!
Wolfhart
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
email <>)
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--