xsl-list
[Top] [All Lists]

Complex splitting of XML tag to multiple other XML tags using XSL T

2002-10-20 06:43:37
Hello XSLT experts!!

We recieve XML files from one of our customers and then 
transform it into our own XML format using 
XSLT 1.0 (and Xalan 1.3), but we have a specific problem:

----------
We have the following DTD snippet (for the customer XML):

<!ELEMENT ADLIST (head, lines)>
<!ELEMENT head (#PCDATA)>
<!ELEMENT lines (TeleLine, InetLine)+>
<!ELEMENT TeleLine ( text1?, texte2? )>
<!ELEMENT InetLine (#PCDATA)>
<!ELEMENT text1 (#PCDATA)>
<!ELEMENT text2 (#PCDATA)>

In general we want to use XSLT to convert ONE <ADLIST> tag
to ONE <AD> tag, where our own DTD for the <AD> tag is 
the following:

<!ELEMENT AD (head?, lines)>
<!ATTLIST AD SEQ CDATA (U|S|M|E) #REQUIRED>
<!ELEMENT head (#PCDATA)>
<!ELEMENT lines (TeleLine, InetLine)+>
<!ELEMENT TeleLine ( text1?, texte2? )>
<!ELEMENT InetLine (#PCDATA)>
<!ELEMENT text1 (#PCDATA)>
<!ELEMENT text2 (#PCDATA)>

In doing the one-to-one conversion, we set the SEQ attribute 
to the value 'U' (undefined). 
The one-to-one conversion is NOT a problem!
----------

In certain circumstances we want to convert an <ADLIST> tag 
to several <AD> tags, using the SEQ attribute to reflect 
the sequence of the <AD> tags in relation to the 
original <ADLIST>.
The semantics of this atrribute is 'S' for Start, 
'M' for Middle and 'E' for End.

The rules for splitting the original <ADLIST> tag into 
several <AD> tags, is as follows:

1) The <ADLIST> tag must contain:
    a) more than one <TeleLine> tag and at least one 
       <InetLine> tag or
    b) more than one <InetLine> tag and at least one 
       <TeleLine> tag

2) The <ADLIST> tag MUST contain a <TeleLine> tag that 
   contains a <text1> tag and is NOT the first <TeleLine> 
   tag.

3) The <ADLIST> tag must be split at <TeleLine> tags that 
   contains an <text1> tag.

When doing the split, we have to obey the following:

i)   The first <AD> tag contains at LEAST one <TeleLine>
     and at LEAST one <TeleLine>, NOT more than one of both.
     Furthermore only the first <AD> tag contains the 
     <head> tag from the original XML and this <AD> tag 
     should have the SEQ attribute set to 'S'.

ii)  The last <AD> tag contains the LAST <TeleLine> tag with 
     a <text1> tag (and eventual <InetLine> and/or <TeleLine> 
     with NO <text1> tag that follows).
     The last <AD> tag should have the SEQ attribute set to 'E'.

iii) Medium <AD> tags (between the first and the last) should 
     be generated for each NOT LAST <TeleLine> tags that 
     contains a <text1> tag.
     These <AD> tags should have the SEQ attribute set to 'M'.

----------

Sometimes (maybe always) an example says more than a 
1000 specification words, so heres an example:

<ADLIST>
  <head>Head Text</head>
  <lines>
    <TeleLine>
       <text2>TTT1</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT2</text1>
    </TeleLine>
    <InetLine>III1</InetLine>
    <InetLine>III2</InetLine>
    <TeleLine>
       <text2>TTT3</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT4</text1>
    </TeleLine>
    <InetLine>III3</InetLine>
    <TeleLine>
      <text1>TTT5</text1>
    </TeleLine>
    <InetLine>III4</InetLine>
    <TeleLine>
      <text1>TTT6</text1>
    </TeleLine>
    <TeleLine>
      <text2>TTT7</text2>
    </TeleLine>
  </lines>
</ADLIST>

Should be converted to the following sequence of <AD> tags:

<AD SEQ="S">
  <head>Head Text</head>
  <lines>
    <TeleLine>
       <text2>TTT1</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT2</text1>
    </TeleLine>
    <InetLine>III1</InetLine>
  </lines>
</AD>

<AD SEQ="M">
  <lines>
    <InetLine>III2</InetLine>
    <TeleLine>
       <text2>TTT3</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT4</text1>
    </TeleLine>
  </lines>
</AD>

<AD SEQ="M">
  <lines>   
    <InetLine>III3</InetLine>
    <TeleLine>
      <text1>TTT5</text1>
    </TeleLine>
  </lines>
</AD>

<AD SEQ="E">
  <lines>  
    <InetLine>III4</InetLine>
    <TeleLine>
      <text1>TTT6</text1>
    </TeleLine>
    <TeleLine>
      <text2>TTT7</text2>
    </TeleLine>
  </lines>
</AD>
-------

I suppose the solution requires some elaborate use 
of the <xsl:key> tag, but i just cant seem to figure 
it out (believe me - i have tried)!

If anyone out there can help, i would REALLY appreciate 
it (and even buy that someone some excellent danish beer, 
if he or she should ever visit Aarhus in Denmark)!

/Lars

** Stibo Graphic          | Søren Nymarks Vej 21 | DK-8270 Højbjerg 
** mailto:laes(_at_)stibo(_dot_)com  | http://www.stibographic.com 
** Phone:  +45 8939 8939  | Fax:    +45 8939 8940
** Direct: +45 8939 7421


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>
  • Complex splitting of XML tag to multiple other XML tags using XSL T, Lars Eskildsen <=