xsl-list
[Top] [All Lists]

RE: [xsl] Converting CSV to XML without hardcoding schema details in xsl

2006-06-22 20:51:02
Thanks a lot for the xsl, Michael.

My CSV has some commas in some cells - in those cases the entire cell value
is itself enclosed in quotes. So a simple tokenize that splits at comma
boundaries would not work - so I replaced the tokenize for the cells with a
regex that took care of the quotes (is there any alternative here other than
using regex?). I had to specify the quotes in the regex as "
After this, it started taking 45 minutes to transform a 20 columns-35 rows
CSV.

Next problem I found was that for columns that contain commas in the value,
all cells in that column are not enclosed in quotes - only those cells that
actually have commas are enclosed in quotes. So I changed the regex to
account for 0/more quotes. Now it transformed in 45 secs - surprise?
But even now, I see that the 0/more quotes regex throws it off and the csv
gets incorrectly parsed resulting in the wrong xml content.

So I made some changes and the current xsl has the regex as:
<xsl:analyze-string select="."
regex="(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),&quo
t;*(.*)&quot;*,(.*),&quot;*(.*)&quot;*,(.*),(.*),&quot;*($.*)&quot;*,(.*)">

(now it is taking even more time - 1hour+ and still not done. Lets see if
atleast the xml comes out correctly.)

Any suggestions to mitigate these regex complexity due to non-uniformity of
input CSV?

Or am I am better off asking the CSV provider of the CSV to keep the CSV
uniform so that either all cells in the column are with/without quotes?


Thanks,

Vish.

-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com]
Sent: Thursday, June 22, 2006 12:43 AM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details
in xsl

Can anybody suggest how to convert CSV data in the format

Field1,Field2
Value11,Value12

to xml like

<Field1>Value11</Field1>
<Field2>Value12</Field2>

without hardcoding the fieldnames in the xsl?

<xsl:variable name="lines" as="xs:string*"
             select="tokenize(unparsed-text($input-file, '\r?\n'"))"/>
<xsl:variable name="field-names as="xs:string*"
             select="tokenize($lines[1], ',')"/>
<xsl:for-each select="subsequence($lines,2)">
<row>
 <xsl:variable name="cells" select="tokenize(., ',')"/>
 <xsl:for-each select="$cells">
   <xsl:variable name="p" as="xs:integer" select="position()"/>
   <xsl:element name="$fields[$p]"/>
     <xsl:value-of select="."/>
   </
 </
</
</

Michael Kay
http://www.saxonica.com/



I was thinking of something like

<xsl:for-each select="tokenize(., ',')"> &lt;<xsl:value-of
select="item-at($elementNames,index-of(?parent of current
node?,.))"/>&gt; <xsl:value-of select="."/>
&lt;/<xsl:value-of
select="item-at($elementNames,index-of(?parent of current
node?,.))"/>&gt; </xsl:for-each>

where elementNames is a tokenized list of the fieldnames -
but I am unable to get it to work.



-----Original Message-----
From: Pantvaidya, Vishwajit
Sent: Wednesday, June 21, 2006 12:17 AM
To: 'xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com'
Subject: [xsl] Converting CSV to XML without hardcoding
schema details
in xsl

Hello,

I am trying to convert a CSV datafile into XMl format.
The headers for the CSV data are in a file header.csv e.g.
Field1,Field2 The data is in a file Data.csv e.g.
Value11,Value12
Value21,Value22

I need to convert the CSV data into xml output by creating
xml elements
using the names in the csv header and taking the
corresponding values
from the data file, so that I get an xml as follows:

<doc>
<line>
<Field1>Value11</Field1>
<Field2>Value12</Field2>
</line>
<line>
<Field1>Value21</Field1>
<Field2>Value22</Field2>
</line>
</doc>

I was trying to see if I can do this without hardcoding the header
names in the xsl. I reached upto the point where my xsl
looks as below:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:op="http://www.w3.org/2001/12/xquery-operators";
   xmlns:xf="http://www.w3.org/2001/12/xquery-functions";
version="2.0">

   <xsl:output  name="xmlFormat" method="xml" indent="yes"
omit-xml-declaration="yes"/>

   <xsl:variable name="source1" select="'data.csv'"/>
   <xsl:variable name="elementNamesList" select="'Header.csv'"/>
   <xsl:variable name="encoding" select="'iso-8859-1'"/>

   <xsl:variable name="elementNames"
select="tokenize(unparsed-text($elementNamesList,$encoding),',')"/>
   <xsl:variable name="src">
       <doc>
           <xsl:for-each
select="tokenize(unparsed-text($source1,$encoding), '\r?\n')">
               <line>
                   <xsl:for-each select="tokenize(., ',')">
                       &lt;<xsl:value-of
select="op:item-at($elementNames,index-of(?parent of current
node?,.))"/>&gt;
                           <xsl:value-of select="."/>
                           &lt;/<xsl:value-of
select="item-at($elementNames,3)"/>&gt;
                   </xsl:for-each>
               </line>
           </xsl:for-each>
       </doc>
   </xsl:variable>

   <xsl:template match="/">
       <xsl:result-document format = "xmlFormat" href = "src1.xml">
           <xsl:copy-of select="$src"/>
       </xsl:result-document>
   </xsl:template>

</xsl:stylesheet>

In the yet-incomplete statement <xsl:value-of
select="op:item-at($elementNames,index-of(?parent of current
node?,.))"/>, I am trying to generate an xml element with
the Nth field
name from the headers name list for the Nth field value. Couple of
issues/questions here:

- I am getting the error "Cannot find a matching 2-argument function
named {http://www.w3.org/2001/12/xquery-operators}item-at()"
when I try
to validate the xsl. What could be the reason?

- How can I get the ?parent of current node? Needed to compute the
index of the current data in the data record?

- Is there any other better way to do it? Any way that I can do the
same using xsl:element?

In general, is this the only/best way or is there any other
better way
to achieve the same goal?


Thanks and Regards,

Vish.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--