xsl-list
[Top] [All Lists]

Re: [xsl] How to modify a RDF document and preserve the <!DOCTYPE rdf:RDF[ and entity references?

2011-03-06 07:33:38
Hi Alex,

If I understand your question correctly, you want to transform an XML document that contains a DOCTYPE declaration, but you don't want the named entities to be replaced by the entity values? There are a few things to consider:

1) XSLT does not preserve the doctype, but you can work around this by reading the source document with unparsed-text() and adding it to the result-tree with disable-output-escaping, or use a more tricky (but more expandable) approach which involves xsl:character-maps. This as least gives you the doctype declaration back

2) Your doctype declares entities for namespace names. These will be automatically filled in by the XSLT processor in the correct locations. If you want to shorten these declarations back to your named entity references you'll be pushing it beyond the rules of what XSLT is supposed to do: create valid XML plus namespaces. It's usually a bad idea and the trick to fix this is so cumbersome (maybe someone knows a better way) that I strongly advice against it. The simplest approach I can think of is to grab the output XML and parse it again, now as unparsed-text, and use a regular expression to replace the namespaces. Use text as (second) output format.

This is a scenario that's not well supported by XSLT, simply because you try to reverse something that's already removed by the XML parser (before it even gets to the XSLT processor).

3) From an XML point of view, there's no difference in your document with or without the doctype and named entities, provided the named entities are replaced with the entity values. If your document is supposed to be machine-processed, then there really is not need going to great length trying to bypass the XML standards.

4) An alternative solution: again, use character-maps, but create the XSLT based on your input (or if the doctype doesn't change, do it only once). The character-maps will replace the characters in the output.

Solution (1) above can be found with an example a couple of years back in this list, but I can't remember when exactly. I'm not entirely sure if solution (4) works without namespace declarations, because technically, they aren't attributes. But I think they ought to be replaced as well.

But, as others on this list may suggest as well: consider not doing this at all. It's messy, a lot of work, and it doesn't improve your output from an XML point of few.

Kind regards,
Abel Braaksma



On 6-3-2011 13:27, Alex Muir wrote:
Hi,

What do I need to add to an xslt 2.0 stylesheet that modifies an RDF
file which has a doctype declaration with entity references. I'm not
certain how to preserve the DOCTYPE here exactly as shown and also
preserve the entity references such&wiki; within the document.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rdf:RDF[
     <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
     <!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
     <!ENTITY owl 'http://www.w3.org/2002/07/owl#'>
     <!ENTITY swivt 'http://semantic-mediawiki.org/swivt/1.0#'>
     <!ENTITY wiki 'http://p13.itawiki.org/wiki/Special:URIResolver/'>
     <!ENTITY property
'http://p13.itawiki.org/wiki/Special:URIResolver/Property-3A'>
     <!ENTITY wikiurl 'http://localhost/wiki/'>
]>

<rdf:RDF
     xmlns:rdf="&rdf;"
     xmlns:rdfs="&rdfs;"
     xmlns:owl ="&owl;"
     xmlns:swivt="&swivt;"
     xmlns:wiki="&wiki;"
     xmlns:property="&property;">

Currently this is being replaced
<property:Office rdf:resource="&wiki;BX"/>

as this in my xslt.
  <property:Office
rdf:resource="http://p13.itawiki.org/wiki/Special:URIResolver/BX"/>

I've been reading some old posts on this but I haven't been able to
key in on the right solution via google.

Regards

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--