Hi all,
I have a CSV file (named, test.csv) as following (as an example, two
lines/records are shown below):
hi,"this is a long string, please tokenize me",hello,world
hello,please tokenize me,hi there
I want this to be transformed to following XML:
<result>
<record>
<field>hi</field>
<field>this is a long string, please tokenize me</field>
<field>hello</field>
<field>world</field>
</record>
<record>
<field>hello</field>
<field>please tokenize me</field>
<field>hi there</field>
</record>
</result>
i.e, each line/record should be tokenized by a comma, with a
restriction that a comma inside a double quoted string should not be
considered as a delimiter:
Below is my attempt upto now.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:variable name="filedata" select="unparsed-text('test.csv')" />
<xsl:template match="/">
<result>
<xsl:for-each select="tokenize($filedata, '\r?\n')">
<record>
<xsl:for-each select="tokenize(., ',')">
<field>
<xsl:value-of select="." />
</field>
</xsl:for-each>
</record>
</xsl:for-each>
</result>
</xsl:template>
</xsl:stylesheet>
The above stylesheet produces following output:
<result>
<record>
<field>hi</field>
<field>"this is a long string</field>
<field> please tokenize me"</field>
<field>hello</field>
<field>world</field>
</record>
<record>
<field>hello</field>
<field>please tokenize me</field>
<field>hi there</field>
</record>
</result>
As per my requirement, following output fragment
<field>"this is a long string</field>
<field> please tokenize me"</field>
is wrong.
This should actually appear as:
<field>this is a long string, please tokenize me</field>
I would appreciate any help regarding this problem.
I am using XSLT 2.0 with Saxon 9.x.
--
Regards,
Mukul Gandhi
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--