Re: Access to unparsed entities


On Fri, 18 Oct 2002, Jeni Tennison wrote:

Hi Wendell, Greg,

It would be nice to have such [unparsed] entities stored in a table
when the document is first read in, such that an XSL transformation
can read from and write to the table, and such that the table is
again written out in the document's internal DTD subset after
transformation is complete.


Wouldn't it? This sounds like something very nice for XSLT 2.0. Off
hand, I don't know what what they're planning if anything. (Can
anyone speak to that? Jeni?)


Hmm... Well, there's a "could" requirement for this in the XSLT 2.0
requirements [1]:

  2.16 Could Improve Support for Unparsed Entities

  In XSLT 1.0 there is an asymmetry in support for unparsed entities.
  They can be handled on input but not on output. In particular, there
  is no way to do an identity transformation that preserves them. At a
  minimum we need the ability to retrieve the Public ID of an unparsed
  entity.
 
The latest XSLT 2.0 WD has got a function to support the ability to
retrieve the public ID of an unparsed entity, namely
unparsed-entity-public-id() [2]. So there's enough information
available in the stylesheet to let you build the table of unparsed
entities yourself.


This is certainly improvement, as at least no information from the source
document is inaccessible to the transformation.

If you did build such a table, then you can use the set of elements
described in Appendix G, "Representation of Lexical XML Constructs"
[3] in order to create a DOCTYPE declaration in which you declare the
entities that you want to declare. Something like:

  <lex:doctype name="foo">
    <xsl:for-each select="$entity">
      <lex:unparsed-entity-declaration name="{.}"
        system-id="{unparsed-entity-uri(.)}"
        public-id="{unparsed-entity-public-id(.)}" />
    </xsl:for-each>
  </lex:doctype>

(Hmm... I see that there's no way of getting the entity notation at
the moment; we should probably address that, but that, of course,
means also adding notation declarations, which aren't supported at
all currently -- or is the notation something that's derivable from
the public/system ID?)


Another possibility is to build the table using a SAX filter, and insert
the contents of the table into the document using elements defined in
Appendix G, as you demonstrate above. This has the advantage that it could
be made to work with XSLT 1.0, and wouldn't require any extensions.

I hadn't read Appendix G, but now that I have, I think it is preferable to
trying to reconstruct the document type internal subset in the result
document. It converts all those archaic SGML constructs to plain old XML,
which will make all subsequent processing easier to understand.

If either or both of you could drop a line to
public-qt-comments(_at_)w3(_dot_)org giving an example of what you want to be
able to do, that would be helpful, especially if what I've described
above doesn't meet your requirements.


As long as nothing declared in the document is hidden from the
transformation, I think the standard is adequate. XSLT 2.0 has addressed
the lack of access to an entity's public identifier. It would nice if a
future version would also provide access to the notation. Unparsed external
entities are _very_ SGML, and in a schema-enlightened world, will hopefully
go away, so I don't think a strong case could be made for providing extra
support for their construction in an XML result document. Most of them are
probably coming from SGML documents converted to XML.

// Gregory Murphy <Gregory(_dot_)Murphy(_at_)sun(_dot_)com>
// Software Engineer
// Customer Network Platform, Sun Microsystems


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list