xsl-list
[Top] [All Lists]

[xsl] Using xsl:analyze-string and regex to parse long lines with white-space

2007-06-19 11:40:36
Hi All,

I have an input file "input.xml":

input.xml
-------------
<pfarr>
        <pfstring name="dlsite">
q330 0000 345 1169760599.99999 TA_D03A 921 47 -123 0.0325 regular internet hosted 1172293472.07035 q330 0123 234 9999999999.99900 TA_HAST 1005 36 -121 0.5558 regular internet hosted 1172293966.53652 q330 0234 123 1157317200.00000 TA_U04C 718 36 -120 0.7886 vsat spacenet 1172298386.07728
        </pfstring>
</pfarr>

I am trying to parse the contents of <pfstring> to get the 5th column ("TA_D03A" in the example), the 10th ("regular internet") and the 11th ("hosted") for each line and push it to "output.xml" thus:

output.xml
---------------
<dlsites>
        <site name="TA_D03A">
                <comt>regular internet</comt>
                <comp>hosted</comp>
        </site>
        <site name="TA_HAST">
                <comt>regular internet</comt>
                <comp>hosted</comp>
        </site>
        <site name="TA_U04C">
                <comt>vsat</comt>
                <comp>spacenet</comp>
        </site>
</dlsites>

Each entry in input.xml/pfarr/pfstring is on a new line. I am trying to use the regex functions and have the following, but it does not seem to be working:

transform.xsl
-----------------
<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/ Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />

<xsl:template match="/">
    <dlsites>
        <xsl:apply-templates select="/pfarr/pfstring" />
    </dlsites>
</xsl:template>

<xsl:template match="pfstring[(_at_)name = 'dlsite']">
    <xsl:variable name="elValue" select="." />

<xsl:analyze-string select="$elValue" regex="\s*(.*)\s+(.*)\s+ (.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+\n">

        <xsl:matching-substring>
            <xsl:variable name="dlname" select="regex-group(5)" />
            <site name="{(_at_)dlname}">
                <comt><xsl:value-of select="regex-group(10)"/></comt>
                <comp><xsl:value-of select="regex-group(11)"/></comp>
            </site>
        </xsl:matching-substring>

        <xsl:non-matching-substring>
            <unknown>
                <xsl:value-of select="$elValue"/>
            </unknown>
        </xsl:non-matching-substring>

    </xsl:analyze-string>

</xsl:template>

</xsl:stylesheet>

Is this the most efficient way of processing this type of file? It is highly likely that I have something wrong in the regex section - any pointers would be appreciated. The XSLT processor I am using is Saxon 8.9J.

Thanks in advance!
- Rob Newman

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>