[Top] [All Lists]

Re: [xsl] Adding entity declarations to DOCTYPE in xml output

2019-02-26 15:52:34
That general requirement (simple string macros) can be satisfied using 
XInclude, which is implemented by most, if not all, of the modern XML parsers.

XInclude has many limitations (it is not a true use-by-reference facility) but 
it does have at least the same level of utility as text entities without 
requiring the use of DTDs and without some of the problematic aspects of DTDs 
(for example, you can choose to defer or ignore XInclude elements if you want, 
which I often do want depending on processing context).

I could go farther and say that the original SGML design of DTDs was entirely 
misguided as well and should never have been done that way and certainly 
shouldn't have been carried into XML (again, I certainly argued *for* them at 
the time) but that's easy for me to say now. At the time that SGML was being 
defined and implemented the DTD syntax seemed perfectly sensible and it took a 
long time for us to recognize the inherent problems with DTDs as they exist in 

In particular, because they are a purely syntactic mechanism DTDs are a 
security risk and provide no reliable declaration of the actual semantic 
document type of the document that exhibits the DOCTYPE declaration.

Consider this example:

<!DOCTYPE foo [
  <!ENTITY gotcha SYSTEM "/usr/etc/.passwords">

Now load that into a CMS that shall remain nameless running as "root" and look 
at the content that gets stored. Oops.

Or consider:

<!DOCTYPE notabook PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
<!ELEMENT notabook
  (foo, bar?)

Here the DOCTYPE appears to declare this to be a DocBook 4 book and many, if 
not most, DTD-aware systems will use the public ID to bind this document to its 
DocBook-specific configuration.

But this is clearly not a DocBook document (at least to a human observer). But 
an XML system that simply requires the document to be A) valid and B) 
associated with a known external DTD will likely happily accept this document.

Thus, the DOCTYPE declaration tells you *nothing* actionable about the document 
itself. It's completely valid (assuming I didn't introduce typos in the 
internal declaration subset) but meaningless.

By having the grammar declared only by reference, i.e., RELAX NG, XSD, or some 
other grammar, and by using namespaces to qualify at least one thing in the 
document (as the DITA standard does with the @dita:DITAArchVersion attribute) 
the document is unalterably associated with the definition of the thing it's 
supposed to be (that is, the namespace name and the URIs of any associated 
grammars function as names of the "true type" of the document, as opposed to 
just pointers to syntactic rules that guide parsing and validation).

Compare with:

<?xml-model href="http://docbook.org/xml/5.1/rng/docbook.rng"; 
<?xml-model href="http://docbook.org/xml/5.1/rng/docbook.rng"; 
type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron";?>

This is clearly, and unambiguously, not a DocBook document. The model 
references unalterably bind the document to governing schemas that will detect 
the document's invalidity. The lack of the expected (and required) DocBook 
namespace on the root element also exposes this as not being a DocBook document.

Likewise, there is no simple syntactic macro expansion happening here, so the 
security exposure is lower.

So not a fan of DTDs. 

From a pure validation convenience standpoint DTDs are still largely required 
in many contexts because RELAXNG is not (yet) ubiquitous and it has only 
recently gotten the infrastructure needed to support default attributes. But 
that doesn't mean you should also use entities.

I would suggest that in contexts where DTDs are currently used that RELAX NG 
would always be the better solution given appropriate infrastructure but that 
infrastructure is only just now coming to fruition (the DITA community in 
particular has pushed forward more complete implementation of the RELAX NG DTD 
compatibility features, mostly through the efforts of George Bina, because DITA 
depends on defaulted attributes and XSD 1.0 simply doesn't work for DITA (and 
XSD 1.1 is not widely enough implemented)).



Eliot Kimber

On 2/26/19, 3:00 PM, "Michele R Combs mrrothen(_at_)syr(_dot_)edu" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

    Well, I can't speak for the entire XML community but I can tell you that we 
like it a lot.  Our address, the URL to our main library catalog change, the 
URL to our dept home page, etc. are subject to change on a regular basis.  
Having them as entities referenced from our finding aids, rather than 
hard-coded into each file, means that when there is a change we only have to 
update one small XML snippet rather than 3000+ XML files.
    -----Original Message-----
    From: Eliot Kimber ekimber(_at_)contrext(_dot_)com 
    Sent: Tuesday, February 26, 2019 2:51 PM
    To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
    Subject: Re: [xsl] Adding entity declarations to DOCTYPE in xml output
    For the record, retaining external and internal text entities in XML was a 
mistake. It's something I fought for at the time and now regret every time it 
comes up.
    The XML community has been wise in forgetting that text entities were ever 
a feature.
    Eliot Kimber
    On 2/26/19, 12:27 PM, "Michele R Combs mrrothen(_at_)syr(_dot_)edu" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
        Yeah, I was afraid that was the only way to do it :P  Thanks --
        -----Original Message-----
        From: Michael Kay mike(_at_)saxonica(_dot_)com 
        Sent: Monday, February 25, 2019 4:35 PM
        To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
        Subject: Re: [xsl] Adding entity declarations to DOCTYPE in xml output
        If you're able to use Saxon, consider using the saxon:doctype extension 
        It can't be done with any version of standard XSLT, except by 
generating the DTD "by hand" using disable-output-escaping.
        Michael Kay
        > On 25 Feb 2019, at 21:15, Michele R Combs mrrothen(_at_)syr(_dot_)edu 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
        > Hello collective wisdom -
        > I would like to have several entity declarations in my output XML.  
Here's what I currently have in my XSL:
        > <xsl:output
        >   method="xml"
        >   indent="yes"
        >   encoding="utf-8"
        >   exclude-result-prefixes="ns"
        >   omit-xml-declaration="yes"
        >   doctype-system="../ead_dtd/ead.dtd"
        >   doctype-public="+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded 
        > Description (EAD) Version 2002)//EN"/>
        > The output XML looks like this:
        > <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded 
        > Archival Description (EAD) Version 2002)//EN" "../ead_dtd/ead.dtd">
        > I would like it to look like this:
        > <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded 
        > Archival Description (EAD) Version 2002)//EN" "../ead_dtd/ead.dtd" [ 
        > <!ENTITY sua_name SYSTEM "sua_name.txt"> <!ENTITY sua_address SYSTEM 
        > "sua_address.txt"> <!ENTITY subjindex SYSTEM "sua_index.txt"> 
        > summitref SYSTEM "summit_ref.txt"> ]>
        > Is this doable with XSL 1.1?
        > Thanks --
        > Michele
        > +++++++++
        > Michele Combs | Lead Archivist
        > Special Collections Research Center
        > Syracuse University Libraries
        > 222 Waverly Ave
        > Syracuse, New York 13244
        > t 315.443-2081 | e 
mrrothen(_at_)syr(_dot_)edu<mailto:mrrothen(_at_)syr(_dot_)edu> | w 
        > scrc.syr.edu SYRACUSE UNIVERSITY syr.edu <winmail.dat>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com

<Prev in Thread] Current Thread [Next in Thread>