Re: [xsl] When to use text()

On Sat, Mar 22, 2014 at 2:48 PM, Abel Braaksma (Exselt) 
<abel(_at_)exselt(_dot_)net> wrote:

Interesting thoughts.

When designing a language, there will always be a lot of discussion
about the choice of words for keywords, terminology, language
constructs. Take C#, they used the word "assembly" for physically
separated packages, and the word "namespace" for logical separations. To
this day, many (starting) programmers have a hard time understanding
those concepts, not in the last place because "assembly" reminds them of
assembly language and "namespace" about XML namespaces. Similarly, why
did they choose the keyword "fixed" when the meaning is to "pin" a variable?

Those discussions will never end, and should never end. It will always
remind language designers to think carefully about the words they choose.


Amen.

Sometimes though bad habits are subliminally copied.

You shutdown a windows machine by clicking the start button.
You can shutdown an exist database by running start.jar

For me the most succinct and resonant commentary in the whole thread
has been largely ignored.

<quote author="David Sewell">(Applying the Sapir-Whorf hypothesis to
programming languages, i.e. the way a language encodes things
influences the way we think about them.)</quote>


In this particular case, the working group at the time had a conflict of
interest. There was XML, which was already defined, which had text
nodes. And there was XPath (not XSLT) that required a method for
selecting those text nodes. Since they were already called text nodes in
DOM [1], it made sense to follow this nomenclature. Note that, in the
XML Infoset, they did not exist, nor in the original XML specifications.
Instead, they were called character information items[2], which referred
to the individual characters, not the whole node.

On the other hand they had a requirement to be able to atomize nodes, in
other words, to turn them into what is commonly known in computing as a
"string". There are languages that use the keyword TEXT when referring
to strings, but many common languages use the keyword string.

What were they to do? Are there other alternatives? Text nodes needed a
name and atomized text nodes too. Both were an important requirement,
because if you would always atomize, then how can you query mixed content?

An important distinction is that text() is a a KindTest (it tests
whether a given node is a text node, as such, it in fact returns a
boolean), and string() and string(x) are functions that take an implicit
or explicit argument and turn it into a string.

One might argue that you could use is-text() and is-comment(), and
conversely convert-to-string and the like But that doesn't work well in
an expression as para/em/is-text() or even para/em[is-text()], because
the semantics here are not "is" but "has" (select all the nodes that
have an "em" parent, or select all the em-nodes that have one or more
text children). And my argument against convert-to-string would be that
it is annoyingly long, but that's just me. My argument against string()
itself is that it looks too much like a constructor function, which it
is not.

I'm not saying that the choice of words is perfect, but I wanted to
point out that the choice of words is never an easy one. W3C standards
are created by consensus of all the members and it is an open process
where non-members can submit bug reports to draft standards and the
working group is required to look into them. If you have a strong
argument, they are likely to take your argument seriously.


Yes this is the sort of process that one would have imagined. I would
describe it as compromise rather than consensus. My interpretation -
compromise - nobody gets all they want and you end up with something
that all parties agree to live with. Consensus - the parties agree
upon what the best thing is in the circumstances. We are going out for
a meal - I like Chinese you like Indian so we compromise and settle on
Italian - is that the best for all concerned - probably not so it's
not a consensus.

Back from digression.

IMHO there is an overarching viewpoint and here is how I would present it.

Extracting text from an XML document is the hello world of XSLT.
text() would appear to be an obvious way of doing that and it's really
important that it entails no surprises. If I were an XSLT antagonist
that is exactly the sort of thing I would  home in on to portray the
language as arcane, difficult to use and not suitable for my project.


That said, I invite you and everyone on this list or elsewhere to look
at the current XSLT 3.0 Last Call Working Draft[3]. Even now there are
still some open bugs on choices of terms and keywords. It is still open
for bug-reports from anyone, which you can file into W3C's bugzilla[4]
(signing up is easy).


I guess I have to try and make time.

Small disclaimer: I was not a member of the WG at the time they needed
to make a choice for the string() function and text() kindtest, so the
road to consensus I laid out above may not be the actual road that lead
to consensus.

Cheers,

Abel Braaksma
Exselt XSLT 3.0 processor
http://exselt.net

PS: you don't need to look up the spec to remind you of text() vs
string(),


Before this thread I'd never given it a moments thought. Confession -
after this thread that situation is unlikely to have changed.

Why? Because I satisfice - and it's very tempting to preface that wIth
"like most other programmers in the world". It's really instructive to
look at the first two sentences of the wikipedia definition as it
exposes the contrasting viewpoints in this thread.
http://en.wikipedia.org/wiki/Satisficing

I wonder what peoples response was and what they did, back in the
pre-internet days when there wasn't a hyperlinked language
specification available at a click of a button. Programming still got
done then.

Personally I blame Michael Kay. The man is his own  worst enemy. If he
wants people to read the spec then he should stop producing  such
assiduous and fantastically written text books.


in fact, just about any book on XSLT clearly explains their
semantics and pitfalls. And you are right, people starting out with a
language will start with a tutorial book, and that is exactly where they
learn this distinction.


Never read XSLT for Dummies then did you.

Yeah haha explains alot, but I wouldn't knock it.

It was only after reading it that I was able to grok anything in the
other books I had tried to learn XSLT from.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--