The general structure of the problem seems very similar to that of my paper
http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html
and I think it is best tackled using a similar 2-stage approach: first parse
the records using regular expressions, then group them, recursively. Of
course both stages are much easier using XSLT 2.0.
The first stage is to parse the text into elements that retain the
hierarchic numbering, for example
<p id="I1" nr="1" details="Johann der Alchemist, renounced his rights of
succession (1406-146); m.1412 Pss Barbara of Saxe-Wittenberg (1405-1465)"/>
<p id="I2" nr="1.1" details="Rudolf, b.and d.1424"/>
Of course you can do more parsing at this stage if you want, but I don't
think that part is critical to the problem.
Then in stage 2 you need to do some grouping to create the family elements.
In your first grouping phase you want to group by the value of tokenize(@nr,
'\.')[1]. Then each of these groups is (recursively) grouped using the key
tokenize(@nr, '\.')[2]; and so on, until the groups are empty.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Vadim Verenich [mailto:vadimverenich(_at_)gmail(_dot_)com]
Sent: 25 August 2008 08:18
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Flat genealogical structure to organized
parent-child relationships
Dear XSLT Experts,
I am having problems with converting flat structured XML file
into hierarchically nested XML.
Last month i read Chapter 19 from Michael Kay's book (it
deals with conversion of unparsed GEDCOM text file into XML
structure) and was very impressed.
Since then i have converted all my GEDCOM files into
GedcomXML format; however some bits of genealogical data in
my digital archive are organized into more classical text
format rather than commonly accepted Gedcom format.
I will use a part of Paul Thereof's Hohenzollern genealogy
scheme to illustrate how does this format looks like:
The text format is as follwing:
//sampe
1.Johann der Alchemist, renounced his rights of succession (1406-146);
m.1412 Pss Barbara of
Saxe-Wittenberg (1405-1465)
1.1.Rudolf, b.and d.1424
1.2.Barbara (1423-1481); m.1433 Luigi III Gonzaga, Margrave
of Mantua (d.1478) 1.3.Elisabeth (1425-after 13 Jan 1465);
m.1st 1437 Duke Joachim of Pomerania (d.1451); m.2d
1453 Duke Wratislaw X of Pomerania (d.1478) 1.4.Dorothea
(1430-1495); m.1st 1445 King Christof III of Denmark
(d.1448); m.2d 1449 King Christian I of Denmark (d.1481)
2.Friedrich II, Elector of Brandenburg (1413-1471); m.1441
Pss Katharina of Saxony (1421-
1476)
2.1.Johann (1452-1454)
2.2.Erasmus, b.after 1452, d.1464/5
2.3.Dorothea (1446-1519); m.1464 Duke Johann V of
Saxe-Lauenburg (d1507) 2.4.Margarete, d.1489; m.ca 1477 Duke
Bogislaw X of Pomerania (d.1523) 3.Albrecht Achilles, Elector
of Brandenburg (1414-1486); he laid down the family rule rare
among German families, the key to its future success, that
Brandenburg would never be divided, but always inherited by
the eldest son, and that the territories of Ansbach and
Bayreuth could be given to younger sons, but not further
subdivided; he m.1st 1446 Mgvine Margarete of Baden (d.1457);
m.2d 1458 Pss Anna of Saxony (1437-1512) // The analysis of
data structure:
As you can see, the structural organization of data is
looking like something akin to two dimensional flat array
(with 2 axis: vertical and horizontal).
The family relations are represented along a horziontal axis,
meanwhile the vertical axis declares parent-child
relationships in string forms 1.n., where string "."
is used as a delimiter between two proceeding generations,
and n is a classifier of an individual within any given generation.
For example: an individual with classifier 1.1. is a child of 1 etc.
I converted this plain text format into CSV data and then
used Altova MapForce to map some fields of this structure
against Rob Mckinnon's Genview XSD scheme and was given a
following result:
<?xml version="1.0" encoding="UTF-8"?>
<genview>
<individual id="@I1@">
<name first="Johann der Alchemist" />
</individual>
<family id="@F1@">
<father ref="@I1@" />
<child />
</family>
<individual id="@I11@">
<name first="Rudolf" />
</individual>
<family id="@F2@">
<father ref="@I11@" />
<child />
<mother/>
</family>
<individual id="@I12@">
<name first=" Barbara " />
</individual>
<family id="@F3@">
<father />
<child />
<mother ref="@I12@"/>
</family>
<individual id="@I13@">
<name first=" Elisabeth " />
</individual>
<family id="@F4@">
<father />
<child />
<mother ref="@I13@"/>
</family>
<individual id="@I14@" >
<name first="Dorothea" />
</individual>
<family id="@F5@">
<father/>
<child />
<mother ref="@I14@"/>
</family>
<individual id="@I2@" >
<name first="Friedrich II" />
</individual>
<family id="@F6@">
<father ref="@I2@" />
<child />
<mother />
</family>
<individual id="@I21@">
<name first="Johann" />
</individual>
<family id="@F7@">
<father ref="@I21@" />
<child />
<mother />
</family>
<individual id="@I22@">
<name first="Erasmus" />
</individual>
<family id="@F8@">
<father ref="@I22@" />
<child />
<mother />
</family>
<individual id="@I23@">
<name first="Dorothea" />
</individual>
<family id="@F9@">
<father />
<child />
<mother ref="@I23@"/>
</family>
<individual id="@I24@">
<name first="Margarete" />
</individual>
<family id="@F10@">
<father />
<child />
<mother ref="@I24@"/>
</family>
<individual id="@I3@">
<name first="Albrecht Achilles" />
</individual>
<family id="@F11@">
<father ref="@I3@" />
<child />
<mother />
</family>
</genview>
It appears that genealogical data was mapped more or less
correctly, but the major issue with this format that it lacks
of parent-child relations I need some XSL techniques/methods
to define relations between parents and children. I tried to
utilize Muenchian method, but it seems alogocal to apply it
for defining hierarhical realtions. What seems more logical
to me is to write XSL transformation routines which define an
unique ID of each individual in this context as a variable or
a key and check/compare this value against other ID
classifiers for occurance of string values after checked
variable (1.1 and 1.1.1; 1.2 and
1.2.1,1.2.3 etc.). When child ID classifier is found, it
should then be written as a child reference (@ref) within
/genview/family/child node.
The required output is following:
<genview>
<individual id="@I1@">
<name first="Johann der Alchemist" />
</individual>
<family id="@F1@">
<father ref="@I1@" />
<child ref="@I11@"/>
<child ref="@I12@"/>
<child ref="@I13@"/>
<child ref="@I14@"/>
</family>
<individual id="@I11@">
<name first="Rudolf" />
</individual>
<family id="@F2@">
<father ref="@I11@" />
<child />
<mother/>
</family>
<individual id="@I12@">
<name first=" Barbara " />
</individual>
<family id="@F3@">
<father />
<child />
<mother ref="@I12@"/>
</family>
<individual id="@I13@">
<name first=" Elisabeth " />
</individual>
<family id="@F4@">
<father />
<child />
<mother ref="@I13@"/>
</family>
<individual id="@I14@" >
<name first="Dorothea" />
</individual>
<family id="@F5@">
<father/>
<child />
<mother ref="@I14@"/>
</family>
<individual id="@I2@" >
<name first="Friedrich II" />
</individual>
<family id="@F6@">
<father ref="@I2@" />
<child ref="@I21@"/>
<child ref="@I22@"/>
<child ref="@I23@"/>
<mother />
</family>
<individual id="@I21@">
<name first="Johann" />
</individual>
<family id="@F7@">
<father ref="@I21@" />
<child />
<mother />
</family>
<individual id="@I22@">
<name first="Erasmus" />
</individual>
<family id="@F8@">
<father ref="@I22@" />
<child />
<mother />
</family>
<individual id="@I23@">
<name first="Dorothea" />
</individual>
<family id="@F9@">
<father />
<child />
<mother ref="@I23@"/>
</family>
</genview>
I am not sure if the result could be achieved by means of
XSLT transformations?
Thank you for your patience and support, Sincerely
Vadim Verenich
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--