xsl-list
[Top] [All Lists]

RE: [xsl] Combining lists without duplication

2007-09-28 13:42:17
I guess there is a node-set that consists of all the subdiv
elements that have nt="V" and a ufi attribute whose value is equal to
the bgn-standard name's ufi. But I don't know how to compare the
iso-name against the whole group of them (as opposed to individually
using for-each).

It's very late in my workday, and I don't have the energy to work out a 
solution for you in detail, but here is an example of how you can match for 
values in a list without using for-each. This requires XSLT 2.0

<?xml version="1.0"?>
<fruit>
  <item>apple</item>
  <item>grape</item>
  <item>peach</item>
  <item>pear</item>
  <item>plum</item>
  <item>raspberry</item>
</fruit>

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:strip-space elements="*" />
  <xsl:output method="text" indent="yes" encoding="UTF-8" />
  
  <xsl:variable name="fruit" select="'plum','peach','banana'"/>

    <xsl:template match="/">
      <xsl:apply-templates />
    </xsl:template>
    
    <xsl:template match="fruit">
      <xsl:apply-templates select="item[.=$fruit]" />
    </xsl:template>

    <xsl:template match="item">
      <xsl:copy-of select="concat(.,'&#x0D;')" />
    </xsl:template>

</xsl:stylesheet>

Your output will be:

peach
 plum
-- 
Charles Knell
cknell(_at_)onebox(_dot_)com - email



-----Original Message-----
From:     Roger Sperberg <rsperberg(_at_)yahoo(_dot_)com>
Sent:     Fri, 28 Sep 2007 13:10:57 -0700 (PDT)
To:       xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject:  [xsl] Combining lists without duplication

I've assembled a list of country subdivisions and I'm wanting to
combine two separate sources of names with this list without
duplicating the names. I'm confused as to how best to go about it.

The
list I've got is an amalgamation from several sources and does contain
some subdivisions not included in the listings from ISO or BGN (the U.S.
Board of Geographic Names). I've concluded, however, that names from
these sources should be utilized whenever possible. 

I've
combined the main list and the ISO list so that each entry contains a
section along the following lines. There may or may not be a second
basename element, with one or more iso-names:


<subdiv fips="AF13">
  <basename>
    <name1>Kabul</name1>
    <name2>Kaboul</name2>
    <name3>Kabul</name3>
    <name4>Kabol</name4>

  </basename> 
  <basename>
    <iso-name>Kabul</iso-name>
    <iso-name2>Kabol</iso-name2>
  </basename>
</subdiv> 

An
entry in the separate BGN-names file includes information indicating
whether it is the preferred name (nt="N") or a variant (nt="V"). Each
entry has a unique id for the name (uni) and a unique id for the
subdivision (ufi) that's shared among the variant names for that
subdivision. Preferred names often include a short form. A form of the
name is also included that removes all accents and diacritics
(bgn-name-nd).


Here are the four entries in that file for the subdivision cited above:

<subdiv
ufi="-3378436" uni="-4801481" fips="AF13" nt="N" short-name="Kabol"
bgn-name="Velayat-e Kabol" bgn-name-nd="Velayat-e Kabol" />

<subdiv ufi="-3378436" uni="-4801502" fips="AF13" nt="V" bgn-name="Velayat-e 
Kabul" bgn-name-nd="Velayat-e Kabul" />
<subdiv ufi="-3378436" uni="-4801510" fips="AF13" nt="V" bgn-name="Kabul 
Province" bgn-name-nd="Kabul Province" />

<subdiv ufi="-3378436" uni="523049" fips="AF13" nt="V" bgn-name="Kabol" 
bgn-name-nd="Kabol" />

The result I'd like would
- use the BGN preferred name's short form, if there is one, as the subdivision 
name

- if not, use the bgn-name
- include the bgn-name and the accent-and-diacritic-free form
All
the other names -- BGN variants, ISO names and/or variants, and names
collected from general sources should be collected in an other-names
element, with duplicates excluded.


In many instances, BGN includes a variant that matches the short form of the 
BGN standard name. I'd like to exclude that.

I'd like to exclude any ISO or generally collected name that matches the 
accent-and-diacritic-free form of the preferred name.


And, obviously, I'd like to exclude any ISO name that
duplicates the BGN preferred name or any BGN variant, and exclude any
generally collected name that duplicates a BGN or ISO name.

The result for Kabol would be:


<subdiv fips="AF13">
  <basename>
    <name>Kabol</name>
    <long-form>Velayat-e Kabol</long-form>
    <long-form-nd>Velayat-e Kabol</long-form-nd>

  </basename>
  <other-names>
    <bgn-variant>Velayat-e Kabul</bgn-variant>
    <bgn-variant>Kabul Province</bgn-variant>
    <iso-name>Kabul</iso-name>



    <alt-name>Kabul</alt-name>
    <alt-name>Kaboul</alt-name>
    <alt-name>Kabol</alt-name>
  </other-names>
</subdiv>

Whenever
no BGN entry exists, I want to use the first ISO entry for the name,
with all other unique names put into the other-names wrapper.


             *        *         *

When I started
working out the XSLT, I began by testing to see if a BGN name existed.
If so, I would use the short form if available, and then add the
variants, testing to see if any of them were the same as @bgn-name-nd.
This would handle 75 to 90 percent of the subdivisions. 

Shortly after that point, my understanding of the correct approach began to 
crumble.

If
an ISO name exists also, I can easily check it against the BGN standard
name and bgn-name-nd, but I'm not sure what the test looks like against
the BGN variants, if there are any. I don't see any way to use for-each
to test against each variant. Nor can I figure out how to rely on
choose/when/otherwise without knowing how many variants there are.


I guess there is a node-set that consists of all the subdiv
elements that have nt="V" and a ufi attribute whose value is equal to
the bgn-standard name's ufi. But I don't know how to compare the
iso-name against the whole group of them (as opposed to individually
using for-each).


And then when I have added iso-names, how do I compare each
generally collected name against the BGN and ISO names? It must be the
same process, but now I'm getting a pretty complicated set.

Guidance, please?


I tried searching the list archives, but (a) I'm not sure how
to term what I'm looking for and (b) I wasn't sure that what I found
actually applied. Just pointing me to the right section in a reference
would be very welcome.


I'm transforming the file using Saxon B 8.9 and XSLT 2.0 so I can use the third 
parameter with key().

Thanks.

Roger Sperberg
A not-too-frequent XSLT-er
Montclair, NJ 
--
Cambodian Language Exercises -- cambodian.tiddlyspot.com
Beginning Cambodian Reader -- cambodian-reader.tiddlyspot.com



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>