xsl-list
[Top] [All Lists]

RE: for roger Glover..., Knowledge management XML

2003-02-10 18:04:52
Hello,

I have included the final code for others to
experiment.


This is interesting problem of matching the two XML
data sheets to get one correct one. the Knowledge
mangement aspect with regards to the XSL sheet which
has Person names who are authors of publications.

If I have a knowledge XML of say <author>Micheal
Kay</author> is same as <author>M. Kay</author> in one
xml data sheet in the form:
<samePersons>
<author>Micheal Kay</author> <!-- the actual correct
one that I want in database-->
<author>Micheal</author>
<author>Micheal K.</author>
</samepersons>

I have a seperate xml data sheet. that simply has all
the "knowledge" mentioned. how can I sort/delete the
error names for my current XML, which is
<person id="0003">
Micheal Kay
</person>

I hope I am explaining you properly. I have one XML
data sheet which has knowledge of which ones aer right
and which ones are wrong names. I want to delete the
errornous elements in my main XML sheet so that only
the correct names are shown.
Also, if I delete the errornoues elements, I have put
the correct id in the pubper element also.

Suggest whether should I do this when I am generating
the ids (XSL sheet show below) or after I generate the
ids in a seperate XSL.

Jinesh




-----------------------------------------------
final code:
<xsl:transform version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:output method="xml" indent="yes"
xmlns:xalan="http://xml.apache.org/xalan";
xalan:indent-amount="4" />
<xsl:variable name="persons">
<xsl:apply-templates
select="//publication/author[not(.=preceding::author
or
.=preceding::editor)]|//publication/editor[not(.=preceding::author
or .=preceding::editor)]"
mode="generate-person"/>
</xsl:variable>

<!-- Similar to original "generate-author-id"
template, but generates entire person element-->
<xsl:template match="author|editor"
mode="generate-person">
<xsl:if test="normalize-space(.)"> <!-- this is to
prevent any emply author/editor elements to get ids
-->
<xsl:variable name="temp"
select="concat('000000000',position())" />
    <xsl:variable name="perid"
select="substring($temp,string-length($temp)-9)"/>
    <person perid="{$perid}">
        <personname>
            <xsl:value-of select="."/>
        </personname>
    </person>
</xsl:if>
</xsl:template>

<xsl:template match="dblp">

    <dblp>
        <!-- copies the "person" elements result tree
fragment into the result tree -->
        <xsl:copy-of select="$persons"/>
        <xsl:apply-templates select="publication"/>
    </dblp>
</xsl:template>

<xsl:template match="publication">

    <!-- Same as in the original code -->
    <publication>
        <xsl:copy-of select="@*|*[not(self::author or
self::editor)]"/>
    </publication>

    <!-- calls template to create "pubper" elements,
one per publication per pub author -->
    <xsl:apply-templates select="author|editor"/>
</xsl:template>
    
<!-- creates "pubper" elements -->
<xsl:template match="author|editor">
<xsl:if test="normalize-space(.)">
    <pubper>

        <!-- gets "pubid" from parent  -->
        <pubid>
            <xsl:value-of select="../@pubid"/>
        </pubid>
    
        <!-- gets "perid" from "$persons" variable -->
        <perid>   
            
            <!-- Note that in XSLT 1.0 a result tree
fragment like "$persons" does not automatically
convert to a node set.  Therefore
most processors provide an extension function for that
purpose (like "xalan:nodeset()" below) -->
    <xsl:value-of
xmlns:xalan="http://xml.apache.org/xalan";
select="xalan:nodeset($persons)/person[current()=personname]/@perid"
            />
        </perid>
        <persontype><xsl:choose><xsl:when
test="node()=self::editor"><xsl:text>2</xsl:text></xsl:when><xsl:otherwise><xsl:text>1</xsl:text></xsl:otherwise></xsl:choose></persontype>
    </pubper>
</xsl:if>
</xsl:template>
        
</xsl:transform>

--- Roger Glover <glover_roger(_at_)yahoo(_dot_)com> wrote:
Jinesh Varia wrote:

Are you some kind of XML jini!

Far from it.  Just ask the *real* regulars. :-)


thank you very much. I am entangled in this XSL
programming since two weeks and you solved it like
in
a blink.

You were most of the way there, you just needed one
key insight.  It would
have taken me somewhat longer to write this starting
with just an idea.


But there are some serious issues here:

With your approach of generating perids before the
actual seperation of publication, person, pubper
elements, I feel it would not work when I have
500,000
author elements. I have an 130MB XML sheet which
contains almost 350,000 publication elements
I know you did not knew about this. Can you please
comment on this.

Do you think I am right on this? Please correct
me.

I chose this solution not because it was the most
efficient, but because it
was the most direct route I could find from where
you were to where you
wanted to be.

Right now it would probably behoove you to spend
some time with the FAQ, the
spec and other reference resources (I like Michael
Kay's "XSLT Programmer's
Reference"), studying the syntax and usage of the
"<xsl:key>" element and
the "key()" function.  You should then also look up
and study any FAQ
reference to Muenchian grouping.


Now there are also editors along with authors.
Authors
can be editors also for some publication. means
<author>Steve Lawyer</author> for pub1 can be
<editor>Steve Lawyer</editor> for pub2. but we
want
to have single person element generated. While in
<pubper> we have <persontype> (1 for author, 2 for
editors) hence in our example for pub1, it shoud
be
<persontype>1</persontype> and for pub2 it should
be
<persontype>2</persontype>
how can we store that information with your code
then?
we have to get unique person names

Match "author | editor" instead of just "author",
and use either "<xsl:if>"
or "<xsl:choose>" + "<xsl:when>" to choose between
persontype "1" (author)
and persontype "2" (editor).  Likewise, the "select"
expression on
"<xsl:apply-templates>" in the "persons" variable
*would* have to become
much more complicated.  However, if you change to
keys and Muenchian
grouping, the expression will be much simpler.


You dont have a clue How much your code has helped
me!!! I have been working on this since two
weeks...
thanks, roger. thank you

You are very welcome.  Glad to help.  :^)

Let us know if you get stuck, or when you have a
final version.


-- Roger Glover
   glover_roger(_at_)yahoo(_dot_)com



 XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list



=====
-----------------------------------------------------------------
Jinesh Varia
Graduate Student, Information Systems
Pennsylvania State University
Email: jinesh(_at_)psu(_dot_)edu
-----------------------------------------------------------------
'Self is the author of its actions.'

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list