xsl-list
[Top] [All Lists]

Re: sorting titles w stopwords but w/o value in every title node

2004-09-01 13:44:33
Anton and Bruce,
Thanks for your help.  I'm sorry for the delay in responding.  A large tree 
fell on my house about 1 AM Tuesday morning and I have been away from work 
finding a tree service and contractors, etc.  It's  quite a challenge. 

I cannot do a triple sort using doc-number as the first sort.  That just puts 
things in doc-number order.  I don't think I can group on doc-number and then 
sort by title within that group. I think xsl:sort needs a path name.  

Anton says it succinctly, I need to treat records that don't have a title as if 
they do have a title. The link is that they have the same document number.  I 
need the records with the same doc number to show up with the corresponding 
title in arrival-date order.

The processor is Saxon but it's being called from within another application.  
I do not believe I can do a two-step process.  That's why I'm calling the 
stopwords with document() from this stylesheet.
sc
------------------------------

Date: Mon, 30 Aug 2004 09:01:10 -0400
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
From: "Susan Campbell" <SCampbell(_at_)ccla(_dot_)lib(_dot_)fl(_dot_)us>
Subject:  Re: [xsl] sorting titles w stopwords but w/o value in every title node
Message-ID: 
<D44554884CB7D74B87423B62952F369901BD1DF1(_at_)mailbox(_dot_)ccla(_dot_)lib(_dot_)fl(_dot_)us>

Thanks for the help. (I am still referring to the stop-words variable =
with document('')/xsl:stylesheet/sw:stop/word because that does give me =
the sort order. Because
our setup, that may be my only option.)

The problem I still have is that entries without a value in the title =
sort first. =20
I need to group by title when the doc-number is the same. It may be both =
a sorting=20
and grouping problem, but I don't know how to go about it. =20

(The doc number is included only for testing. I left out imprint and =
ISBN from this sample for clarity. It is possible to have the same issue =
or different issue arrive on the same or different days as there are =
multiple subscriptions.)

The output I need is:
doc#    Title                                 Description                       
Arrived date=09
53690 American Artist                   v.68:no.738(2004:Jan.)   02/26/2004
57769   The American city & country     v.119:no.1(2004:Jan.)    02/11/2004=09
57769                                           v.119:no.3(2004:Mar.)    
03/25/2004
58345 American demographics             v.26:no.1(2004:Feb.)     02/05/2004
58345                                   v.26:no.1(2004:Feb.)     02/26/2004
58345                                   v.26:no.2(2004:Mar.)     02/26/2004
58345                                           v.26:no.2(2004:Mar.)     
02/26/2004

Sample of problem causing xml:
-------------
<section-02>
<title>Forbes.</title>
<isbn-issn>0015-6914</isbn-issn>
<doc-number>58615</doc-number>
<description>v.173:no.5(2004:Mar.15)</description>
<arrival-date>03/15/2004</arrival-date>
</section-02>

<section-02>
<title></title>
<isbn-issn-code></isbn-issn-code>
<doc-number>58615</doc-number>
<description>v.173:no.1(2004:Jan. 12)</description>
<arrival-date>01/12/2004</arrival-date>
</section-02>

<section-02>
<title></title>
<isbn-issn-code></isbn-issn-code>
<doc-number>58615</doc-number>
<description>v.173:no.2(2004:Feb. 02)</description>
<arrival-date>01/21/2004</arrival-date>
</section-02>

My stylesheet:
-------------
<xsl:stylesheet
   xmlns:xsl=3D"http://www.w3.org/1999/XSL/Transform"; version=3D"1.0"
   xmlns:sw=3D"mailto:bubba(_at_)aol(_dot_)com"
   exclude-result-prefixes=3D"sw">
<xsl:include href=3D"funcs.xsl"/>
<sw:stop>
        <word>the</word>
        <word>a</word>
        <word>an</word>
</sw:stop>
<xsl:variable name=3D"stop-words" =
select=3D"document('')/xsl:stylesheet/sw:stop/word"/>
<xsl:variable name=3D"lowercase" =
select=3D"'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name=3D"uppercase" select=3D"'ABCDEFGHIJKLMNOPQRSTUV'"/>

<xsl:template match=3D"/">=09
<table border=3D"'1'">
<th colspan=3D"6">Arrived Issues sorted without stop words</th>
<tr>
<td align=3D"center"><b/>number</td>
<td align=3D"center"><b/>Title</td>
<td align=3D"center"><b/>ISBN-ISSN</td>
<td align=3D"center"><b/>Imprint</td>
<td align=3D"center"><b/>Description</td>
<td align=3D"center"><b/>Arrived</td>
</tr>
<xsl:for-each select=3D"//section-02/title">
<xsl:sort select=3D"concat(substring(substring-after(.,' '), 0 div =
boolean
($stop-words[starts-with(translate(current(), $uppercase, $lowercase),=20
concat(translate(., $uppercase, $lowercase), ' '))])), substring(., 0 =
div not
($stop-words[starts-with(translate(current(), $uppercase, $lowercase),=20
concat(translate(., $uppercase, $lowercase), ' '))])))"/>

<xsl:sort select=3D"number(concat(substring(../arrival-date, 7,4),
substring(../arrival-date, 1,2),=20
substring(../arrival-date, 4,2)))" order=3D"descending"/>=20
                =09
<tr>
<td width=3D"10%"><xsl:value-of select=3D"../doc-number"/></td>
<td width=3D"30%"><xsl:value-of select=3D"../title" /></td>
<td width=3D"10%"><xsl:value-of select=3D"../isbn-issn"/></td>
<td width=3D"20%"><xsl:value-of select=3D"../imprint"/></td>
<td width=3D"20%"><xsl:value-of select=3D"../description"/></td>
<td width=3D"10%"><xsl:value-of select=3D"../arrival-date"/></td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>

Thanks,
Susan Campbell
College Center for Library Automation
1753 W. Paul Dirac Drive
Tallahassee, FL 32310
850-922-6044

------------------------------

Date: Mon, 30 Aug 2004 09:17:01 -0400
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
From: Bruce D'Arcus <bdarcus(_at_)myrealbox(_dot_)com>
Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title node
Message-Id: <E0EAD541-FA86-11D8-B6E0-000A959F0E52(_at_)myrealbox(_dot_)com>

On Aug 30, 2004, at 9:01 AM, Susan Campbell wrote:

The problem I still have is that entries without a value in the title 
sort first.
I need to group by title when the doc-number is the same. It may be 
both a sorting
and grouping problem, but I don't know how to go about it.

So is it the case that if two records -- one with a title and one 
without -- share the same doc-number, then they share the same title, 
even if not explicitly coded?

If that were true, I guess logically you'd group by doc-number, and 
then take a title from one among the group and sort on that for the 
groups?

Bruce

------------------------------

Date: Mon, 30 Aug 2004 18:34:32 +0200
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
From: "cking" <cking(_at_)telenet(_dot_)be>
Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title node
Message-ID: <002901c48eaf$3a7e6740$408876d5(_at_)pandora(_dot_)be>

Hi Susan,

Thanks for the help. (I am still referring to the stop-words variable with 
document('')/xsl:stylesheet/sw:stop/word because that does give me the sort 
order. 
Because our setup, that may be my only option.)

I found out why it didn't work for me, it's a namespace issue. I had put your 
template
inside a XHTML-output stylesheet (with xmlns="http://www.w3.org/1999/xhtml";), 
and then "document('')/xsl:stylesheet/sw:stop/word" didn't return anything. If I
change the <word> elements to <sw:word>, it works.

The problem I still have is that entries without a value in the title sort 
first.  
I need to group by title when the doc-number is the same. It may be both a 
sorting 
and grouping problem, but I don't know how to go about it.

(The doc number is included only for testing. I left out imprint and ISBN 
from this 
sample for clarity. It is possible to have the same issue or different issue 
arrive on 
the same or different days as there are multiple subscriptions.)

Maybe I don't fully understand what you're trying to get (esp. that last 
sentence), 
but can't you simply perform a triple-sort instead of double-sort? 
First sort by doc-number, then by title and finally by date?

<xsl:for-each select="//section-02/z13-title">

I guess you're only using "//" in your sample code, because you know this can 
seriously
slow down the transform process (esp. with large input files)? Unless of course 
your
input files are organized with <section-02> elements that can appear anywhere in
the document...

Best regards
Anton Triest

------------------------------

Date: Tue, 31 Aug 2004 03:37:28 +0200
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
From: "cking" <cking(_at_)telenet(_dot_)be>
Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title node
Message-ID: <010401c48efb$13a60780$408876d5(_at_)pandora(_dot_)be>

Susan,

I wrote:
but can't you simply perform a triple-sort instead of double-sort? 
First sort by doc-number, then by title and finally by date?

By rereading your message (desired output, and Bruce's reply), I think I 
understand 
your point. You don't want to sort by doc-number. You want to treat the records 
that 
don't have a title, as if they do have a title, taken from another record with 
the same
doc-number. Is that correct?

What processor are you using? I mean, would it be OK to do a transformation
in two steps? 

Greetings
Anton Triest