RE: Number of scans required ??

I guessed it will be complicated. Here is the short version 
of my big xml.

Below is my xml.

*********************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE AEXDATAEXTRACT SYSTEM "AeXDataExtract_2_2.dtd">

<AEXDATAEXTRACT DTD_VERSION="2.2" 
EXTRACT_START_DATETIME="7/15/2003 11:03:25 
AM" EXTRACT_TYPE="FULL">

  <RESOURCE_TYPE 
GUID="{493435f7-3b17-4c4c-b07f-c23e7ab7781f}" NAME="Computer" 
DESCRIPTION="Definition for Computer" SOURCE="IS" 
CREATED_DATE="5/21/2003 
9:47:08 PM" MODIFIED_DATE="5/21/2003 9:53:57 PM" DELETED="0">

    <RESOURCE GUID="{8EDCCB48-AC8D-474C-852B-B3235563CEA7}" 
NAME="P??W?" 
SOURCE="" SITE_CODE="abc.com" DOMAIN="">

      <INVENTORY>

        <ASSET>

          <IDENTIFICATION>

            <ATTRIBUTE NAME="Name">P??W?</ATTRIBUTE>
            <ATTRIBUTE NAME="Domain" NULL="FALSE" />
            <ATTRIBUTE NAME="Altkey1" NULL="TRUE" />
            <ATTRIBUTE NAME="Altkey2" NULL="TRUE" />

          </IDENTIFICATION>

          <CLASS NAME="Active_Directory_Details" />
          <CLASS NAME="Network_Printer_Details" />
          <CLASS NAME="User_Contact_Details" />
          <CLASS NAME="User_General_Details" />
        </ASSET>

        <CUSTOM>
          <CLASS NAME="FID_OS_System_Info" />
          <CLASS NAME="FID_SW_C2P2" />
          <CLASS NAME="FID_SW_ESM" />
          <CLASS NAME="FID_SW_IE_Patch" />
          <CLASS NAME="FID_SW_Most_Frequent_User" />
          <CLASS NAME="FID_SW_NAV_Management" />
          <CLASS NAME="FID_SW_Tag_File" />
          <CLASS NAME="FID_SW_Virus_Definitions" />
        </CUSTOM>
      </INVENTORY>
    </RESOURCE>
   </RESOURCE_TYPE>
  </AEXDATAEXTRACT>

**********************************************************************


In the above xml <AEXDATAEXTRACT> element is table name. I 
will generate 
Primary key for it using generate id which will be its first 
column, second 
column will be DTD_VERSION and so on for that table.
In the output at the topmost line information like this should come.

Output
------
AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,


At first sight this is simple:

<xsl:for-each select="/AEXDATAEXTRACT/@*">
  <xsl:text>AexID,</xsl:text>
  <xsl:value-of select="name()"/>
  <xsl:text>,</xsl:text>
</xsl:for-each>

But there's a bug in this: it assumes that the order of attributes will
be retained. In fact, the order of attributes in XML is undefined, so
this could output the attributes in any order. If you need the column
names in this order, you are going to have to redesign the source XML
file.


I want its data in second scan. I WILL EXPLAIN ABOUT THIS LATER.

Then when processor encounters new table name i.e. 
RESOURCE_TYPE it will take 
all the columns for this table and add parentID generated to 
it. So the first 
line of the output should look like this.


I don't like this "when the processor encounters". It's better to
describe the processing in terms of what output you want to be produced,
and how it is derived from the input, not in terms of a particular order
of processing.

Simplistically, it looks like:

<xsl:template match="RESOURCE_TYPE">
<xsl:for-each select="@*">
  <xsl:value-of select="name()"/>
  <xsl:text>,</xsl:text>
</xsl:for-each>

But again there is the problem that you are depending on the order of
attributes in XML.


Output
------- 
AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,Resource
_Type_GUID (or 
only GUID both will do even if GUID is there in the other 
table),NAME,DESCRIPTION,SOURCE,CREATED_DATE,MODIFIED_DATE,DELETED>


Does the note in parentheses mean that you have a requirement to
eliminate duplicates here, i.e. to include a column once only if it
appears on multiple "tables"? If so, you need to understand how
elimination of duplicates is done in XSLT. This is essentially the same
problem as grouping, and is discussed at
http://www.jenitennison.com/xslt/grouping.


This it should do for all the tables. I have scan this input 
xml six times. I 
am creating six different outputs as there are 6 different 
items under 
INVENTORY tag for example ASSET, CUSTOM and there are 4 
others like this. 
Within these there are tables (IDENTIFICATION and CLASS 
classifies them as 
another tables) and their names are in
<CLASS> tag's attribute "NAME" and columens are mentioned in 
ATTRIBUTE TAG and 
their value is in the attribute body.


 <INVENTORY>
       <ASSET>
         <IDENTIFICATION>
           <ATTRIBUTE NAME="Name">P??W?</ATTRIBUTE>
           <ATTRIBUTE NAME="Domain" NULL="FALSE" />
           <ATTRIBUTE NAME="Altkey1" NULL="TRUE" />
           <ATTRIBUTE NAME="Altkey2" NULL="TRUE" />
         </IDENTIFICATION>

         <CLASS NAME="Active_Directory_Details" />
         <CLASS NAME="Network_Printer_Details" />
         <CLASS NAME="User_Contact_Details" />
         <CLASS NAME="User_General_Details" />
    </ASSET>
   </INVENTORY>

THERE MAY BE CASES LIKE TABLE WILL NOT HAVE NE DATA.


Sorry, what is "NE data"?


So where table data is present i want to have column names on 
the topmost 
line.


I can't see what the relationship is between your data and the column
names. You're using all sorts of terminology like "parent tags", "inner
tables", etc - you clearly have a lot of understanding of the semantics
of this document structure which you aren't communicating very
effectively.


Then data corresponding to these columns will be obtained in 
another scan. IF 
IT IS POSSIBLE TO GET THE DATA IN THE SAME SCAN PLS INFORM ME 
HOW TO DO THAT. Then since parent tags comes only once and 
data for other innner tables is 
presnt in huge numbers my output will look like this.


You really shouldn't be worrying about how many scans are done. Get the
code correct and working first, see whether it meets the performance
requirements, and if it doesn't, only then start thinking about how to
make it faster.


Ouput
---- 
AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,Resource
_Type_GUID (or 
only GUID both will do even if GUID is there in the other 
table),NAME,DESCRIPTION,SOURCE,CREATED_DATE,MODIFIED_DATE,DELETED>

First line:


Is this all one line? If this ("AexID",2.2,7/15/2003...) is the first
line, then what is the line above
(AexID,DTD_Version,EXTRACT_START_DATETIME...)? Sorry, but you are
confusing me more and more.

Michael Kay


"AexID",2.2,7/15/2003 11:03:25 
AM,FULL,{493435f7-3b17-4c4c-b07f-c23e7ab7781f},Computer, 
Definition for 
Computer,IS,5/21/2003 9:47:08 PM,5/21/2003 9:53:57 
PM,0,{8359DF92-1E29-409D-8189-79BE7C411171},{493435f7-3b17-4c4
c-b07f-c23e7ab77
81f},0001026361C5,,abc.com,WORKGROUP,Win32,unknown,,0,0,,,IDAU
JCFI,{8359DF92-1
E29-409D-8189-79BE7C411171},IDAVJCFI,IDAUJCFI,IDENTIFICATION,I
DAWJCFI,IDAVJCFI
,0001026361C5,WORKGROUP,,

```````{5F1D1043-F808-4AB8-A35F-9DE1DE448F41}`{493435f7-3b17-4
c4c-b07f-c23e7ab
7781f}`216.16.236.246``abc.fmr.com`WORKGROUP````````IDA22CFI`{

5F1D1043-F808-4A

B8-A35F-9DE1DE448F41}`IDA32CFI`IDA22CFI`IDENTIFICATION`IDA42CF
I`IDA32CFI`172.2
6.45.73`WORKGROUP``



As you can see in the above output whereeve data is not there 
i keep it as 
blank and only seperators i.e. ,,

I hope now you will get what i m trying to say.

Eagerly waiting for your reply.

Regards,
Dipesh




Date: Fri, 8 Aug 2003 12:03:23 +0100
From: "Michael Kay" <mhk(_at_)mhk(_dot_)me(_dot_)uk>
Subject: RE: [xsl] Number of scans required ??


Thanks a lot for replying.

Well my document is big enough thats why i haven't pasted it there. 
But i can generailize how it is and then i think it will give you 
proper idea.

<RootNode>
<FirstChild> Some Attributes which are columns : </FirstChild> <A> 
More column names as attributes. <B> More column for this table 
(corresponding to DB) One or two more level of identations

like this.

</B>
</A>
</FirstChild>


No, I'm afraid this doesn't give me a proper idea at all. Are 
your columns represented by attributes or elements? You've 
said attributes, but in that case why aren't they within the 
start tag?

Secondly you have a structure here that is four levels deep 
(plus one or two), yet you are using the terminology of 
tables and columns to describe it. That doesn't fit.

Please producce a cut-down example of your actual document. 
Or perhaps a schema/DTD. I don't understand it from this 
description at all.

Finally, I don't know what you mean by a "scan". I suggest 
you concentrate on writing some correct code first, and then 
worry about how many times the XSLT processor is scanning 
your source document. Apart from anything else, since the 
data is in memory, the number of times the document is 
scanned is not necessarily critical to performance.

Michael Kay


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list