xsl-list
[Top] [All Lists]

[xsl] Re: collection() and uncommon file extensions

2018-11-15 16:52:43
I'm actually encountering the same problem if I change the extensions from .hocr to .xml, so there's definitely something odd going on here. The files are definitely well-formed and appear to be valid. They start like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">

...

Do you see anything that would prevent Saxon 9.9 HE from parsing these as XML? I've come across another situation where files were recognized as XML if they had the XML declaration but not if they didn't (those were SVG), but these fail even with the declaration.

Cheers,
Martin



On 2018-11-15 12:59 p.m., Michael Kay mike(_at_)saxonica(_dot_)com wrote:
Everything about the collection() function is very implementation-specific, so 
this is really a Saxon question rather than an XSLT question. (And no, there 
are no plans to define standards in this area, though it would be nice.)

The way you are going about it looks right to me. It's probably failing because 
of some detail that you didn't realise was important. I know it's difficult to 
put together a repro for this kind of problem but that's really what we need.

Around 40 years ago I worked with an operating system that knew the content 
type of each file. Shame the idea didn't catch on.

Michael Kay
Saxonica



On 15 Nov 2018, at 19:32, Martin Holmes 
gtxxgm-xsl-list-2(_at_)m(_dot_)gmane(_dot_)org 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi all,

The recent changes to XPath 
(https://www.w3.org/TR/xpath-functions-31/#func-collection) have introduced the 
capability for the collection() function to retrieve non-XML documents as well 
as XML documents. However, that has broken some processes I have where XML 
documents with different extensions are being retrieved. For instance, where 
this:

collection('dir/?*.hocr')

used to happily retrieve and parse HOCR files (which are actually XHTML), Saxon 
now treats these files as xs:base64Binary items, and won't parse them, even 
though they have XML declarations.

I know that the recommended approach to dealing with this is to use a Saxon 
configuration file to register the file extension -- which I presume would be 
done like this:

<resources>
  <fileExtension extension="hocr" mediaType="text/xml"/>
</resources>

However, this doesn't seem to work for me -- do I have that syntax wrong?

Also, the conf file approach isn't easily portable, so I'm wondering if there 
are any plans to enable the media type to be specified on the collection() 
function itself, or to be registered in an XSLT document somehow?

Cheers,
Martin



--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>