Hi Andrew,
I am not sure that I understand well your point on QNames and
normalize-space, as if the header line is missing from the csv, picking
it up to try to derive element or attribute names from it is bound to
generate invalid QName errors, for example, with an amount or dollar
value. The way I currently understand it, one would have to test and
determine if line 1 provides headers or not, figuring options if it is not.
I would, for example, use something like
<xsl:function name="fn:get-headers" as="xs:string+">
<xsl:param name="param"/>
<xsl:param name="line1"/>
<xsl:param name="line2"/>
<xsl:variable name="headerline" select="if (string($param)) then
$param else if (every $x in fn:get-tokens($line1) satisfies ($x castable
as xs:QName)) then $line1 else ''"/>
<xsl:variable name="headers" select="fn:get-tokens($headerline)"/>
<xsl:for-each select="1 to max((count(fn:get-tokens($line2)),
count($headers)))">
<xsl:variable name="pos" select="position()"/>
<xsl:value-of select="if ($headers[$pos] castable as
xs:QName) then $headers[$pos] else concat('col', string($pos))"/>
</xsl:for-each>
</xsl:function>
to get the headers, covering most cases of missing, or partly missing,
or not missing column headers, with the option of providing a set of
column header names as a parameter, assuming an invocation, from your
example code, like
<xsl:variable name="names" select="fn:get-headers($headers, $lines[1],
$lines[2])" as="xs:string+"/>
where this $headers is a parameter to the main "csv2xml" template, which
could be empty or a string similar to the expected csv header line.
The fact that there may not be a header line in the csv file, also
implies that the line
<xsl:for-each select="$lines[position() > 1]">
may have to be changed to something like
<xsl:for-each select="$lines[position() > every $x in $line[1]
satisfies ($x castable as xs:QName)]">
for example.
'
' indeed displays as space in html but wouldn't '\r?\n' be more
portable?
As a note on extending your example, the name for <root> and <row> could
be parametrized and I think that I would move <root> further outside the
nested code and allow $csvpath to be a space-delimited name list, for
example, to easily support csv file merge into the tree, by simply
looping over the tokenized file paths.
Your code offers a good basic design and I especially like your regex
token grabber.
Thank you,
ac
On 3 February 2010 04:11, ac <ac(_at_)hyperbase(_dot_)com> wrote:
Hi,
Andrew, your code is fine but it seems that, to read lines, the line
<xsl:variable name="lines" select="tokenize($csv, ' ')"
as="xs:string+" />
should be more like
<xsl:variable name="lines" select="tokenize($csv, '\r?\n')"
as="xs:string+" />
as there could be spaces in the cells, and as the end-of-line would not be
recognized anyway.
What do you think?
That's from it being displayed as html (which i should probably
fix)... if you use the download link to get the file instead then you
can see that it tokenizes on a carriage return:
<xsl:variable name="lines" select="tokenize($csv, '
')" as="xs:string+"/>
Also, but purely as a matter of taste and case, since all cell values are
strings, I would tend to use attributes, replacing
<elem name="{.}">
<xsl:value-of select="$lineItems[$pos]" />
</elem>
with
<xsl:attribute name="{.}" select="$lineItems[$pos]"/>
See below.....
Finally, one may also have to handle the case of csv files that do not have
an initial header line with all valid QName strings.
It can - that's why the name is stored as a name attribute on a
general <elem> element, so that they dont have to be QNames.
I get a lot of csv files (e.g. from the bank) where the first line is a
blank line
Use normalize-space on the entire csv text (the string returned from
unparsed-text) then use it again on the column names.
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--