xsl-list
[Top] [All Lists]

Re: [xsl] csv to xml converter bug

2007-07-11 05:11:00
On 7/10/07, Andrew Welch <andrew(_dot_)j(_dot_)welch(_at_)gmail(_dot_)com> 
wrote:
On 7/10/07, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
> Haven't worked out the detail, but it seems to me that if you add a trailing
> comma at the end of the string, you can then do
>
> <xsl:analyze-string select="concat($in, ',')" regex='("[^"]*"|[^,]*),'>
>   <xsl:matching-substring>
>     <token><xsl:value-of select="regex-group(1)"/></token>
>   </xsl:matching-substring>
> </xsl:analyze-string>

Hmm, seems to work.

> Doesn't strip the quotes off, but that part's easy.

It is, especially as Abel wrote it for me :)

I'll try it out and then write it up, thanks!

I had to modify it to cope with nested quotes,  such as "foo, ""bar"""
- this is what I came up with:

<xsl:function name="fn:getTokens" as="xs:string+">
 <xsl:param name="str" as="xs:string"/>
 <xsl:analyze-string select="concat($str, ',')" regex='(("[^"]*")+|[^,]*),'>
   <xsl:matching-substring>
     <xsl:sequence select='replace(regex-group(1), "^""|""$|("")""", "$1")'/>
   </xsl:matching-substring>
 </xsl:analyze-string>
</xsl:function>

I think its a neat use of regex-group to capture both sides of the
pipe (quoted and unquoted values) but not the trailing comma.  Any
comments welcome.

I've posted the complete transform here:
http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html

cheers
andrew
--
http://andrewjwelch.com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--