xsl-list
[Top] [All Lists]

Re: Was: [xsl] mode and moved to Namespaces

2011-04-20 10:42:50
Abel, Florent, All,

Ok guys, I am giving up. I realize that I cannot effectively contribute here.

From my viewpoint, most who replied seem to miss my point and focus on a detail hardly relevant to the issue. I am sorry for misreading Andrew, and apologize, but feel plenty misread myself.

The point was about having more than 2 namespaces in a stylesheet, as initially proposed by Ken, in a response to a question. When I noted that just a Boolean set of public/private namespaces may not cover the range of effective access types required in many cases, while assuming and noting that Ken was surely quite aware of this also, and I also noted that I had a case where I was using around 80 namespaces in a single stylesheet, I was asked to show them. Andrew also noted that he used 8 at most, if I read him right, now. I submitted a list of 76, including 34 that are imposed by outside libraries and existing well known XML standards, 39 related to knowledge domains (e.g. math, time, space, security, music, primary keys, virtual worlds, etc.) currently managed by the application, as well as 4 at the bottom of the list, used for a translation dictionary with specific constraints. As I submitted the namespace list, and following remarks by Michael and Gerrit that Saxon used linear searches for namespaces, rather than binary, for example, I also proposed enhancing namespace support, especially for efficiency and possible hierarchical namespaces.

Unfortunately, the overwhelming majority of questions and comments that I received, inside and outside of this XSL-list, have been about the translation dictionary, to which I tried to answer with simplified examples, only to be told that I was simply wrong and that what I was doing was just wrong, even if it solved a problem in a specific context.

So, fine. I gave examples using the prefixes to simplify answers to comments about the complexity of querying the dictionary with namespaces. I also mentioned that using the URIs was safer and not much overhead.

Unfortunately, no one seems interested by the real issues, and most seem to want to quibble with a detail of little relevance to those issues. Forget the 4 dictionary namespaces if it breaks your religion, they are a very small and inconsequential detail, relative to the issues at stake. 72 namespaces is still 9 times 8 namespaces, and 36 times 2 namespaces. You may know your namespaces, as well as what is right and what is wrong, but please check focus, context, and perspective.

As you say, I may be wrong, but I get the feeling that with 40 years of development and over ten in XSLT, a stylesheet of over 20K lines, or more specifically an integrated set of 25 stylesheets with an average of 1000 lines each, using around 80 namespaces to manage over a terabyte of XML elements, under many XML-based standards, in parallel transformation pipelines, to do Knowledge Resource Entitlement, Modeling, Management, and Sharing, I, at least, can make out the difference between a namespace prefix and its URI.

Maybe I can't write, or maybe I am just plainly and simply wrong whatsoever, or maybe this is a list for newbie issues, but in any case, I am sorry that we are all unfortunately wasting valuable energy.

Thank you for your efforts and please forgive mine.

Regards,
ac



Hi ac,

This thread is rather long, so please forgive me if I've misunderstood
anything, but I'd like to add my thoughts to the discussion.

You seem to want to use namespaces as tags for names, which is not
what they're intended for. As reason for doing so you consider space
saving, but if space is an issue, don't use XML. If it's about the
memory footprint, it doesn't matter whether your nodes have
namespaces, because practically every node takes approximately the
same amount of memory, regardless its name or kind (there's an older
thread by Michael Kay where he explains how much memory each node
takes). In other words, your argument for size doesn't play.

A namespace is prefix-agnostic. That means that, if <en:word /> is
connected to namespace "http://example.com/french";, and <fr:word /> is
so too, both qualified names are equal. Treating them differently is
wrong design.

The real problem, however, comes from portability and
understandability. You redefine namespaces to something that's nothing
more than a tag or prefix. That makes your solution unportable and not
machine readable anymore. I.e., if a simple identity transform would
take all namespace prefixes and replaced them with ns1, ns2 etc (but
leave the namespace itself, and hence the qualified names, intact),
your application would fail. However, such transformations are quite
common in XML and totally legal.

By redefining what a namespace means (or, more specifically, by
ignoring it's real meaning and making it part of the local-name, which
is basically what you are doing), you stop using XML by how it was
meant to be. Your XML in and of itself is still compliant, but your
applications and how they treat XML are not. That's a choice, but if
you go down that path, you can just as well choose your own format,
which will give you far better results in performance, space and
requirements.

----

Back to your real problem: suppose we accept that you need to use XML
and that you do not want to abuse namespaces for something they're
not. How could we tackle your issue? I'd go for a straight structure
and use what's already there:

<word type="title" xml:lang="en-GB" gender="female">Mrs</word>
<word type="title" xml:lang="fr-FR" gender="female">Mme</word>

this is the approach Microsoft chooses (or at least similar) in
Word-ML, which looks big, but is quite workable. Now, suppose you want
to minimize the disk footprint (as already said, the memory footprint
will be largely the same regardless), you could do something like this:

<word type="title" en-m="Mr" en-f="Mrs" fr-m="M." fr-f="Mme"/>

as it turns out, this is effectively smaller than your
namespace-oriented approach. If you really want the type of the word
in each and every attribute-name and split the atribute name later,
you can do that, but code-smell ahead! Something like:

<word en-title="Mr" en-f-title="Mrs" fr-title="M." fr-f-title="Mme"
... />

But really, you shouldn't go down that path, it has exactly the same
drawbacks as your namespace approach (albeit slightly better
extensible). It will backfire once you start using it.

Moral of this story: use XML for what it is for: a verbose and
descriptive method of describing data. If space is of essence, don't
use XML, as it will work against you. Use namespaces for what they're
supposed for: separating semantically different sets of names that are
supposed to be treated differently (compare xslt namespace and svg
namespace: they require different applications).

Kind regards,

Abel Braaksma



/_On 20-4-2011 3:01, ac wrote:_/
Hi Jirka,

I appreciate your time, consideration, suggestions, and arguments.

You are right, there is a lookup cost, and this is not the way I
prefer to use namespaces.  OTOH, the space saving and associated
overhead saving can justify the lookup cost for something that can
get large and needs to stay in busy memory, at least for a while.

It would be much nicer if namespaces could be further supported,
including support for hierarchical namespaces, as well as namespace
optimization.  Namespaces are, apart from comments, one of the three
basic XML constructs. Three isn't much, which is fine, but each
should be maximized to help better satisfy application requirements.

I do not doubt that you are open-minded and I certainly appreciate
your constructive comments.  In fact I agree with them.  I do admit
that "simply wrong" did not allow me to understand and contribute
technically.  But it is looking much better now.

I also realize that matching on the names is risky and would be
better addressed through the URIs.  The added cost is not high enough
to justify the risk, and the space saving is probably still worth the
effort, depending on the number of languages that need to be
supported, the size of the vocabulary, and the memory constraints.

Still, as everything is a trade off, I would still maintain, given
all constraints, that this is another valid use case for namespaces,
when it applies.  I would also recommend that we consider how
namespaces can better fulfill more useful roles in XML, including how
they can be expanded, and more efficiently supported.

There is a real conceptual need for namespaces and it may be that we
are just starting to better realize it.

Regards,
ac



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ac wrote:

The current translation dictionary is setup somewhat like:
...
<word en:title="Mr" f-en:title="Mrs" fr:title="M." f-fr:title="Mme"
... />
<word en:noun="chair" fr:noun="chaise" ... />
...

all feminine variants can be returned with:
/dic/word/@*[starts-with(name(.), 'f-')]
Such lookups will tend to be quite slow because matching on name of
element/attribute can't be done using dictionary -- high efficient XSLT
implementations doesn't store element/attributes names for each node,
but they store just number pointing to dictionary with the real
qualified name. This saves memory and makes matching on name very fast.
But if name is not directly present in XPath such fast matching
can't be
done.

all French feminine can be returned with
/dic/word/@f-fr:*
all French feminine adjectives can be returned with
/dic/word/@f-fr:adjective
all translated English words return form
/dic/word/@en:*
The trouble with such approach is that you can't change language during
the runtime. You have to pregenerate all queries before running
transformation or use dynamic XPath evaluation (which is not part of
XSLT standard yet).

all English nouns, whatever gender, can be obtained with something
like
/dic/word/@*:nouns[contains(name(), 'en:')]
If you are using namespaces then this code is not correct. You should
match on namespace name not actual prefix used. So query should be more
like:

/dic/word/@*:nouns[namespace-uri() = 'whatever URI was assigned to en']

It must be good to know what is right from what is wrong,
especially with an absolute perspective.
I have to admit that I have always had some disbelief about absolute
beliefs,
but I will keep an open mind, at least just in case.
I consider myself very open-minded. Your usage for namespaces in this
particular case surely works for you, but it's misuse of namespaces.
They were not designed for this and their usage for this several
engineering flaws.

- -- -
------------------------------------------------------------------
   Jirka Kosek      e-mail: jirka(_at_)kosek(_dot_)cz      http://xmlguru.cz
- ------------------------------------------------------------------
        Professional XML consulting and training services
   DocBook customization, custom XSLT/XSL-FO document processing
- ------------------------------------------------------------------
  OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
- ------------------------------------------------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2t+6cACgkQzwmSw7n0dR4F/ACfRIwtkthd9SXVzk4fV+iKoHoe
XbkAnR6T4sWLdIzdyi/+J9gjIr/V8jEd
=1Loa
-----END PGP SIGNATURE-----

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--