xsl-list
[Top] [All Lists]

Re: [xsl] finding a minimum set of references

2013-01-09 06:57:18
This can be done in three steps:

 1. Get the distinct values of concat(@area,'+',@site) for all tree elements.

 2. Get a sorted sequence of the corresponding `loc` elements that
were produced in step 1. The sort is by the count of `tree` elements
that have that particular loc element as descendant.

 3. Recursively add to the "minList" the top of the sorted loc
elements that isn't yet in this "minList". Stop condition -- when
there is nothing more to add.

I believe that this would really produce the minimum sequence of
locations -- due to the fact that we pick locations in decreasing
order of trees that reference them.

I would be willing to produce the code that implements this algorithm,
but I would need to be provided with a complete (but not too long)
source XML document -- so that the result could be easily verified.


Cheers,
Dimitre


On Tue, Jan 8, 2013 at 5:37 PM, Graydon <graydon(_at_)marost(_dot_)ca> wrote:
So I have a lot (~600) tree elements that look like:

    <tree parent="attribution">
        <count count="1"/>
        <locations>
            <loc area="loiprop" cite="2010-07-prop10"/>
        </locations>
    </tree>
    <tree parent="block">
        <count count="10"/>
        <locations>
            <loc area="loiprop" cite="2004-09-bud2004"/>
            <loc area="loiprop" cite="2010-07-prop10"/>
            <loc area="loiprop" cite="2010-08-allprop"/>
        </locations>
    </tree>
    <tree parent="block">
        <key>section</key>
        <count count="6689"/>
        <locations>
            <loc area="CRCc945-en" cite="2606"/>
            <loc area="CRCc945-en" cite="402"/>
            <loc area="CRCc945-en" cite="6804"/>
            <loc area="CRCc945-en" cite="8301"/>
            <loc area="CRCc945-en" cite="8308.1"/>
        ....
        </locations>
        <child>section</child>
    </tree>

The @area,@cite pairs represent documents with the locations of particular 
patterns (via the @parent and the child elements of the tree elements) of 
documentation content, one tree element per pattern.  (So there are lots of 
tree/@parent="block" elements, etc.)

What's wanted is the minimum number of documents that contain _all_ the 
patterns, for output testing purposes.

So I need to produce the, or at least _a_, shortest list of <loc/> elements 
so that every (all ~600) tree element contains at least one loc element on 
the list.

This has turned out to be much less straightforward than I thought.  I can't 
shake the feeling that there's a grouping solution, but I'm not seeing it if 
there is.  So I'd appreciate any algorithm hints anybody has got; 
unfortunately, efficiency is a concern.

Demonstration that this is really an NP-complete problem also grateful 
accepted!

Thanks!
Graydon

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--