xsl-list
[Top] [All Lists]

[xsl] Tokenizing and transforming a CSV file

2009-02-25 11:44:54
Hi all,
  I have a CSV file (named, test.csv) as following (as an example, two
lines/records are shown below):

hi,"this is a long string, please tokenize me",hello,world
hello,please tokenize me,hi there

I want this to be transformed to following XML:

<result>
   <record>
      <field>hi</field>
      <field>this is a long string, please tokenize me</field>
      <field>hello</field>
      <field>world</field>
   </record>
   <record>
      <field>hello</field>
      <field>please tokenize me</field>
      <field>hi there</field>
   </record>
</result>

i.e, each line/record should be tokenized by a comma, with a
restriction that a comma inside a double quoted string should not be
considered as a delimiter:

Below is my attempt upto now.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                       version="2.0">

   <xsl:output method="xml" indent="yes" />

   <xsl:variable name="filedata" select="unparsed-text('test.csv')" />

   <xsl:template match="/">
      <result>
        <xsl:for-each select="tokenize($filedata, '\r?\n')">
          <record>
            <xsl:for-each select="tokenize(., ',')">
              <field>
                <xsl:value-of select="." />
              </field>
            </xsl:for-each>
          </record>
        </xsl:for-each>
      </result>
   </xsl:template>

</xsl:stylesheet>

The above stylesheet produces following output:

<result>
   <record>
      <field>hi</field>
      <field>"this is a long string</field>
      <field> please tokenize me"</field>
      <field>hello</field>
      <field>world</field>
   </record>
   <record>
      <field>hello</field>
      <field>please tokenize me</field>
      <field>hi there</field>
   </record>
</result>

As per my requirement, following output fragment

<field>"this is a long string</field>
<field> please tokenize me"</field>

is wrong.

This should actually appear as:

<field>this is a long string, please tokenize me</field>

I would appreciate any help regarding this problem.

I am using XSLT 2.0 with Saxon 9.x.


-- 
Regards,
Mukul Gandhi

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--