xsl-list
[Top] [All Lists]

RE: [xsl] Flat genealogical structure to organized parent-child relationships

2008-08-25 04:05:33
The general structure of the problem seems very similar to that of my paper

http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html

and I think it is best tackled using a similar 2-stage approach: first parse
the records using regular expressions, then group them, recursively. Of
course both stages are much easier using XSLT 2.0.

The first stage is to parse the text into elements that retain the
hierarchic numbering, for example

<p id="I1" nr="1" details="Johann der Alchemist, renounced his rights of
succession (1406-146); m.1412 Pss Barbara of Saxe-Wittenberg (1405-1465)"/>
<p id="I2" nr="1.1" details="Rudolf, b.and d.1424"/>

Of course you can do more parsing at this stage if you want, but I don't
think that part is critical to the problem.

Then in stage 2 you need to do some grouping to create the family elements. 

In your first grouping phase you want to group by the value of tokenize(@nr,
'\.')[1]. Then each of these groups is (recursively) grouped using the key
tokenize(@nr, '\.')[2]; and so on, until the groups are empty.

Michael Kay
http://www.saxonica.com/ 

-----Original Message-----
From: Vadim Verenich [mailto:vadimverenich(_at_)gmail(_dot_)com] 
Sent: 25 August 2008 08:18
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Flat genealogical structure to organized 
parent-child relationships

Dear XSLT Experts,
I am having problems with converting flat structured XML file 
into hierarchically nested XML.
Last month i read Chapter 19 from Michael Kay's book (it 
deals with conversion of unparsed GEDCOM text file into XML 
structure) and was very impressed.
Since then i have converted all my GEDCOM files into 
GedcomXML format; however some bits of genealogical data in 
my digital archive are organized into more classical text 
format rather than commonly accepted Gedcom format.
I will use a part of Paul Thereof's Hohenzollern genealogy 
scheme to illustrate how does this format looks like:
The text format is as follwing:
//sampe
1.Johann der Alchemist, renounced his rights of succession (1406-146);
m.1412 Pss Barbara of
Saxe-Wittenberg (1405-1465)
1.1.Rudolf, b.and d.1424
1.2.Barbara (1423-1481); m.1433 Luigi III Gonzaga, Margrave 
of Mantua (d.1478) 1.3.Elisabeth (1425-after 13 Jan 1465); 
m.1st 1437 Duke Joachim of Pomerania (d.1451); m.2d
1453 Duke Wratislaw X of Pomerania (d.1478) 1.4.Dorothea 
(1430-1495); m.1st 1445 King Christof III of Denmark 
(d.1448); m.2d 1449 King Christian I of Denmark (d.1481) 
2.Friedrich II, Elector of Brandenburg (1413-1471); m.1441 
Pss Katharina of Saxony (1421-
1476)
2.1.Johann (1452-1454)
2.2.Erasmus, b.after 1452, d.1464/5
2.3.Dorothea (1446-1519); m.1464 Duke Johann V of 
Saxe-Lauenburg (d1507) 2.4.Margarete, d.1489; m.ca 1477 Duke 
Bogislaw X of Pomerania (d.1523) 3.Albrecht Achilles, Elector 
of Brandenburg (1414-1486); he laid down the family rule rare 
among German families, the key to its future success, that 
Brandenburg would never be divided, but always inherited by 
the eldest son, and that the territories of Ansbach and 
Bayreuth could be given to younger sons, but not further 
subdivided; he m.1st 1446 Mgvine Margarete of Baden (d.1457); 
m.2d 1458 Pss Anna of Saxony (1437-1512) // The analysis of 
data structure:
As you can see, the structural organization of data is 
looking like something akin to two dimensional flat array 
(with 2 axis: vertical and horizontal).
The family relations are represented along a horziontal axis, 
meanwhile the vertical axis declares parent-child 
relationships in string forms 1.n., where string "."
is used as a delimiter between two proceeding generations, 
and n is a classifier of an individual within any given generation.
For example: an individual with classifier 1.1. is a child of 1 etc.
I converted this plain text format into CSV data and then 
used Altova MapForce to map some fields  of this structure 
against Rob Mckinnon's Genview XSD scheme and was given a 
following result:
<?xml version="1.0" encoding="UTF-8"?>
<genview>
 <individual id="@I1@">
 <name first="Johann der Alchemist" />
 </individual>
 <family id="@F1@">
 <father ref="@I1@" />
 <child />
 </family>
 <individual id="@I11@">
 <name first="Rudolf" />
 </individual>
 <family id="@F2@">
 <father ref="@I11@" />
 <child />
               <mother/>
 </family>
 <individual id="@I12@">
 <name first=" Barbara " />
 </individual>
 <family id="@F3@">
 <father />
 <child />
 <mother ref="@I12@"/>
 </family>
 <individual id="@I13@">
 <name first=" Elisabeth " />
 </individual>
 <family id="@F4@">
 <father />
 <child />
 <mother ref="@I13@"/>
 </family>
 <individual id="@I14@" >
 <name first="Dorothea" />
 </individual>
 <family id="@F5@">
 <father/>
 <child />
 <mother ref="@I14@"/>
 </family>
 <individual id="@I2@" >
 <name first="Friedrich II" />
 </individual>
 <family id="@F6@">
 <father ref="@I2@" />
 <child />
 <mother />
 </family>
 <individual id="@I21@">
 <name first="Johann" />
 </individual>
 <family id="@F7@">
 <father ref="@I21@" />
 <child />
 <mother />
 </family>
 <individual id="@I22@">
 <name first="Erasmus" />
 </individual>
 <family id="@F8@">
 <father ref="@I22@" />
 <child />
 <mother />
 </family>
 <individual id="@I23@">
 <name first="Dorothea" />
 </individual>
 <family id="@F9@">
 <father />
 <child />
 <mother ref="@I23@"/>
 </family>
 <individual id="@I24@">
 <name first="Margarete" />
 </individual>
 <family id="@F10@">
 <father />
 <child />
 <mother ref="@I24@"/>
 </family>
 <individual id="@I3@">
 <name first="Albrecht Achilles" />
 </individual>
 <family id="@F11@">
 <father ref="@I3@" />
 <child />
 <mother />
 </family>

</genview>
It appears that genealogical data was mapped more or less 
correctly, but the major issue with this format that it lacks 
of parent-child relations I need some XSL techniques/methods 
to define relations between parents and children. I tried to 
utilize Muenchian method, but it seems alogocal to apply it 
for defining hierarhical realtions. What seems more logical 
to me is to write XSL transformation routines which define an 
unique ID of each individual in this context as a variable or 
a key and check/compare this value against other ID 
classifiers for occurance of string values after checked 
variable (1.1 and 1.1.1; 1.2 and
1.2.1,1.2.3 etc.). When child ID classifier is found, it 
should then be written as a child reference (@ref) within 
/genview/family/child node.
The required output is following:


<genview>
 <individual id="@I1@">
 <name first="Johann der Alchemist" />
 </individual>
 <family id="@F1@">
 <father ref="@I1@" />
 <child ref="@I11@"/>
               <child ref="@I12@"/>
               <child ref="@I13@"/>
               <child ref="@I14@"/>
 </family>
 <individual id="@I11@">
 <name first="Rudolf" />
 </individual>
 <family id="@F2@">
 <father ref="@I11@" />
 <child />
               <mother/>
 </family>
 <individual id="@I12@">
 <name first=" Barbara " />
 </individual>
 <family id="@F3@">
 <father />
 <child />
 <mother ref="@I12@"/>
 </family>
 <individual id="@I13@">
 <name first=" Elisabeth " />
 </individual>
 <family id="@F4@">
 <father />
 <child />
 <mother ref="@I13@"/>
 </family>
 <individual id="@I14@" >
 <name first="Dorothea" />
 </individual>
 <family id="@F5@">
 <father/>
 <child />
 <mother ref="@I14@"/>
 </family>
 <individual id="@I2@" >
 <name first="Friedrich II" />
 </individual>
 <family id="@F6@">
 <father ref="@I2@" />
 <child ref="@I21@"/>
               <child ref="@I22@"/>
               <child ref="@I23@"/>
 <mother />
 </family>
 <individual id="@I21@">
 <name first="Johann" />
 </individual>
 <family id="@F7@">
 <father ref="@I21@" />
 <child />
 <mother />
 </family>
 <individual id="@I22@">
 <name first="Erasmus" />
 </individual>
 <family id="@F8@">
 <father ref="@I22@" />
 <child />
 <mother />
 </family>
 <individual id="@I23@">
 <name first="Dorothea" />
 </individual>
 <family id="@F9@">
 <father />
 <child />
 <mother ref="@I23@"/>
 </family>

</genview>
I am not sure if the result could be achieved by means of 
XSLT transformations?
Thank you for your patience and support, Sincerely

Vadim Verenich

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--