xsl-list
[Top] [All Lists]

RE: [xsl] Tokenizing and transforming a CSV file

2009-02-25 11:53:57
I would use xsl:analyze-string rather than tokenize(), with a regex such as

(,"[^"]*")|(,[^,]*)

Michael Kay 
http://www.saxonica.com/

-----Original Message-----
From: Mukul Gandhi [mailto:gandhi(_dot_)mukul(_at_)gmail(_dot_)com] 
Sent: 25 February 2009 16:44
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Tokenizing and transforming a CSV file

Hi all,
  I have a CSV file (named, test.csv) as following (as an 
example, two lines/records are shown below):

hi,"this is a long string, please tokenize me",hello,world 
hello,please tokenize me,hi there

I want this to be transformed to following XML:

<result>
   <record>
      <field>hi</field>
      <field>this is a long string, please tokenize me</field>
      <field>hello</field>
      <field>world</field>
   </record>
   <record>
      <field>hello</field>
      <field>please tokenize me</field>
      <field>hi there</field>
   </record>
</result>

i.e, each line/record should be tokenized by a comma, with a 
restriction that a comma inside a double quoted string should 
not be considered as a delimiter:

Below is my attempt upto now.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                       version="2.0">

   <xsl:output method="xml" indent="yes" />

   <xsl:variable name="filedata" select="unparsed-text('test.csv')" />

   <xsl:template match="/">
      <result>
        <xsl:for-each select="tokenize($filedata, '\r?\n')">
          <record>
            <xsl:for-each select="tokenize(., ',')">
              <field>
              <xsl:value-of select="." />
            </field>
          </xsl:for-each>
        </record>
      </xsl:for-each>
      </result>
   </xsl:template>

</xsl:stylesheet>

The above stylesheet produces following output:

<result>
   <record>
      <field>hi</field>
      <field>"this is a long string</field>
      <field> please tokenize me"</field>
      <field>hello</field>
      <field>world</field>
   </record>
   <record>
      <field>hello</field>
      <field>please tokenize me</field>
      <field>hi there</field>
   </record>
</result>

As per my requirement, following output fragment

<field>"this is a long string</field>
<field> please tokenize me"</field>

is wrong.

This should actually appear as:

<field>this is a long string, please tokenize me</field>

I would appreciate any help regarding this problem.

I am using XSLT 2.0 with Saxon 9.x.


--
Regards,
Mukul Gandhi

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--