Hi,
I have been beating my head against the wall on something, I was hoping to
get help with.
I have an XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl"
href="S:\Database\submissions\xslt_reports\subject_list.xsl"?>
<pharmgkb xmlns="http://www.pharmgkb.org/schema/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.pharmgkb.org/schema/
http://www.pharmgkb.org/schema/root.xsd">
<gene pharmgkbId="PA117">
<referenceSequence>
<dnaSequence>ccagTAAGCGCCCTCCTAATCCCCGCAGCGCCACC</dnaSequence>
<experiment>
<sampleSetXref
resource="PharmGKB">PA128747821</sampleSetXref>
<genotypesInSubject>
<subjectXref
resource="PharmGKB">PA126746586</subjectXref>
<pcrResult>
<assayXref
resource="local">18</assayXref>
<sequencedBothStrands>true</sequencedBothStrands>
</pcrResult>
</genotypesInSubject>
<genotypesInSubject>
<subjectXref
resource="PharmGKB">PA126746600</subjectXref>
<pcrResult>
<assayXref
resource="local">18</assayXref>
<sequencedBothStrands>true</sequencedBothStrands>
</pcrResult>
</genotypesInSubject>
<genotypesInSubject>
<subjectXref
resource="PharmGKB">PA126746573</subjectXref>
<pcrResult>
<assayXref
resource="local">18</assayXref>
<sequencedBothStrands>true</sequencedBothStrands>
</pcrResult>
</genotypesInSubject>
</experiment>
</referenceSequence>
<referenceSequence>
<dnaSequence>cttcTGCTGTCTCTTCTGAG</dnaSequence>
<experiment>
<sampleSetXref
resource="PharmGKB">PA128747821</sampleSetXref>
<genotypesInSubject>
<subjectXref
resource="PharmGKB">PA126746541</subjectXref>
<pcrResult>
<assayXref
resource="local">19</assayXref>
<sequencedBothStrands>true</sequencedBothStrands>
<variant>
<position>320</position>
<firstAllele>G</firstAllele>
<secondAllele>G</secondAllele>
</variant>
<variant>
<position>315</position>
<firstAllele>C</firstAllele>
<secondAllele>G</secondAllele>
</variant>
<variant>
<position>465</position>
<firstAllele>T</firstAllele>
<secondAllele>C</secondAllele>
</variant>
<variant>
<position>194</position>
<firstAllele>C</firstAllele>
<secondAllele>C</secondAllele>
</variant>
</pcrResult>
</genotypesInSubject>
<genotypesInSubject>
<subjectXref
resource="PharmGKB">PA126746573</subjectXref>
<pcrResult>
<assayXref
resource="local">19</assayXref>
<sequencedBothStrands>true</sequencedBothStrands>
<variant>
<position>194</position>
<firstAllele>C</firstAllele>
<secondAllele>C</secondAllele>
</variant>
<variant>
<position>465</position>
<firstAllele>T</firstAllele>
<secondAllele>C</secondAllele>
</variant>
<variant>
<position>315</position>
<firstAllele>C</firstAllele>
<secondAllele>G</secondAllele>
</variant>
<variant>
<position>320</position>
<firstAllele>G</firstAllele>
<secondAllele>G</secondAllele>
</variant>
</pcrResult>
</genotypesInSubject>
<genotypesInSubject>
<subjectXref
resource="PharmGKB">PA126746574</subjectXref>
<pcrResult>
<assayXref
resource="local">19</assayXref>
<sequencedBothStrands>true</sequencedBothStrands>
<variant>
<position>320</position>
<firstAllele>G</firstAllele>
<secondAllele>G</secondAllele>
</variant>
<variant>
<position>315</position>
<firstAllele>C</firstAllele>
<secondAllele>G</secondAllele>
</variant>
<variant> >
<position>194</position>
<firstAllele>C</firstAllele>
<secondAllele>C</secondAllele>
</variant>
<variant>
<position>465</position>
<firstAllele>T</firstAllele>
<secondAllele>C</secondAllele>
</variant>
</pcrResult>
</genotypesInSubject>
<genotypesInSubject>
<subjectXref
resource="PharmGKB">PA126746575</subjectXref>
<pcrResult>
<assayXref
resource="local">19</assayXref>
<sequencedBothStrands>true</sequencedBothStrands>
<variant>
<position>465</position>
<firstAllele>C</firstAllele>
<secondAllele>C</secondAllele>
</variant>
<variant>
<position>315</position>
<firstAllele>G</firstAllele>
<secondAllele>G</secondAllele>
</variant>
<variant>
<position>194</position>
<firstAllele>C</firstAllele>
<secondAllele>C</secondAllele>
</variant>
<variant>
<position>320</position>
<firstAllele>G</firstAllele>
<secondAllele>G</secondAllele>
</variant>
</pcrResult>
</genotypesInSubject>
</experiment>
</referenceSequence>
</gene>
</pharmgkb>
I need to take this and make it into a spreadsheet like
Experiment 1
subject Variant1 Variant2 Variant3
1 A/A A/G
2 C/A
3
So I thought I would output it as csv file like (excluding experiments for
now)
,Variant1,Variant2,...
subjectID,allele1/allele2,etc.
What I need then is the input doc, turned to be by subject, then for each
subject, output a row of data, grouped by experiment, then variant position.
I have tried many different things with keys, variables, muenchian grouping,
and the best I can do so far is get the subJectXref (unique subjects) . I
have to this point been very unsuccessful making layers of grouping. I
thought I could make a key for an <experiment> node using position, but I
found a post on how position() won't work in key, since it will always be 1.
I haven't worked with XSLT for a few years now, and seem to be having trouble
getting anything that works.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:n1="http://www.pharmgkb.org/schema/"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:output method="text" indent="yes"/>
<!--index all the subject elements in doc by value of subjectXref-->
<xsl:key name="subject-key" match="//n1:subjectXref" use="."/>
<xsl:key name="exp-variant-key" match="//n1:variant" use="n1:position"/>
<xsl:template match="/">
<!--get unique subjects -->
<xsl:variable name="unique-subjects"
select="//n1:subjectXref[generate-id(.)=generate-id(key('subject-key', .))]"/>
<!--get all the experiments-->
<xsl:variable name="expNodes" select="//n1:experiment"/>
<!--get all the variants-->
<xsl:variable name="varNodes" select="//n1:variant"/>
<!--loop on unique subjects-->
<xsl:for-each select="$unique-subjects">
<!-- Save off subject value -->
<xsl:variable name="subj" select="."/>
<!-- Output carriage return, subjectID, which is the
value of the element subjectXref, and comma-->
<xsl:text/>
<xsl:value-of select="."/>,
<!-- for each experiment -->
<xsl:for-each select="$expNodes">
<!--get all genotypes in each experient for the
current subject-->
<xsl:variable name="genotypeNode"
select="n1:genotypesInSubject[n1:subjectXref=$subj]"/>
<!--get unique variants by experiment-->
<xsl:variable name="unique-exp-variants"
select="//n1:variant[generate-id(.)=generate-id(key('exp-variant-key',n1:position))]"/>
<xsl:for-each select="$unique-exp-variants">
<xsl:if
test="../../../[n1:experient=$unique-exp-variants]">
<xsl:value-of
select="./n1:position"/>,
</xsl:if>
</xsl:for-each>
<!--get all unique variants for each
experiment-->
<xsl:for-each select="n1:pcrResult/n1:variant">
<allele1>
<xsl:value-of
select="n1:allele1"/>
</allele1>
<allele2>
<xsl:value-of
select="n1:allele2"/>
</allele2>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Thanks!
-mat
I have looked at the info from Jeni's site on grouping, and multiple
grouping, and trolled for other posts, but have yet to translate that to
working bits of code. And I am doing this in XML Spy with default processor,
if that helps.
Anyway, if anyone has some quick pointers (or some extra brains they can send
for me to insert into my head) that would be nice :-)
-mat
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list