Re: [xsl] How to determine the end-of-line marker in unparsed-text?

On 9 May 2016 at 18:17, Costello, Roger L. costello(_at_)mitre(_dot_)org
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi Folks,

I am writing an XSLT program to read unparsed-text. I want my program to work 
regardless of the end-of-line marker used by the unparsed-text.

Here are the end-of-line markers that are typically used in files, I think:

        CR + LF
        LF
        CR

Below is code that I wrote to determine the end-of-line marker used in the 
unparsed-text. Is there a better way to determine the end-of-line marker?

Note: $file is a variable that contains the unparsed-text.

<xsl:variable name="end-of-line">
        <xsl:choose>
                <!-- CR + LF (decimal 13 followed by decimal 10) -->
                <xsl:when test="contains($file, codepoints-to-string((13, 
10)))">
                        <xsl:value-of select="codepoints-to-string((13, 
10))"/>
                </xsl:when>
                <!-- LF  (decimal 10) -->
                <xsl:when test="contains($file, codepoints-to-string(10))">
                        <xsl:value-of select="codepoints-to-string(10)"/>
                </xsl:when>
                <!-- CR  (decimal 13) -->
                <xsl:when test="contains($file, codepoints-to-string(13))">
                        <xsl:value-of select="codepoints-to-string(13)"/>
                </xsl:when>
                <!-- Perhaps the input file consists of just one line and 
there is no end-of-line marker!
                             What would be an appropriate value in this 
situation? An error? -->
                <xsl:otherwise>
                        <xsl:value-of 
select="error(QName('http://example.com/', 'EOL-err'), 'No end-of-line 
symbol')"/>
                </xsl:otherwise>
        </xsl:choose>
</xsl:variable>
--~----------------------------------------------------------------


I think it's usually better to just write the code so that it works
with any EOL marker, any such test as you suggest above will get
confused with the (not uncommon) case of files with inconsistent line
endings for example.

For example if you have a text file file.txt then

<xsl:param name="filelinest"
       select="tokenize(unparsed-text('file.txt'),'[\r\n]+')"/>

will give you a sequence of strings where any combination of U+000A or
U+000D counts as a separator.
(If you need to consider \r\n\rn as a blank line you need to be a
slightly different regexp such as \r|\n|\r\n
the version above intentionally drops blank lines)

xpath 3 has a provided function that does more or less exactly this
(probably more efficiently than doing a regexp over the whole file
contents)

https://www.w3.org/TR/xpath-functions-30/#func-unparsed-text-lines

David
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

Previous by Date:	[xsl] How to determine the end-of-line marker in unparsed-text?, Costello, Roger L. costello(_at_)mitre(_dot_)org
Next by Date:	Re: [xsl] How to determine the end-of-line marker in unparsed-text?, G. Ken Holman g(_dot_)ken(_dot_)holman(_at_)gmail(_dot_)com
Previous by Thread:	[xsl] How to determine the end-of-line marker in unparsed-text?, Costello, Roger L. costello(_at_)mitre(_dot_)org
Next by Thread:	Re: [xsl] How to determine the end-of-line marker in unparsed-text?, G. Ken Holman g(_dot_)ken(_dot_)holman(_at_)gmail(_dot_)com
Indexes:	[Date] [Thread] [Top] [All Lists]