That general requirement (simple string macros) can be satisfied using
XInclude, which is implemented by most, if not all, of the modern XML parsers.
XInclude has many limitations (it is not a true use-by-reference facility) but
it does have at least the same level of utility as text entities without
requiring the use of DTDs and without some of the problematic aspects of DTDs
(for example, you can choose to defer or ignore XInclude elements if you want,
which I often do want depending on processing context).
I could go farther and say that the original SGML design of DTDs was entirely
misguided as well and should never have been done that way and certainly
shouldn't have been carried into XML (again, I certainly argued *for* them at
the time) but that's easy for me to say now. At the time that SGML was being
defined and implemented the DTD syntax seemed perfectly sensible and it took a
long time for us to recognize the inherent problems with DTDs as they exist in
SGML and XML.
In particular, because they are a purely syntactic mechanism DTDs are a
security risk and provide no reliable declaration of the actual semantic
document type of the document that exhibits the DOCTYPE declaration.
Consider this example:
<!DOCTYPE foo [
<!ENTITY gotcha SYSTEM "/usr/etc/.passwords">
]>
<foo>&gotcha;</foo>
Now load that into a CMS that shall remain nameless running as "root" and look
at the content that gets stored. Oops.
Or consider:
<!DOCTYPE notabook PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://docbook.org/xml/4.5/docbookx.dtd"[
<!ELEMENT notabook
(foo, bar?)
]>
<!ELEMENT foo EMPTY >
<!ELEMENT bar EMPTY >
<notabook>
<foo/>
</notebook>
Here the DOCTYPE appears to declare this to be a DocBook 4 book and many, if
not most, DTD-aware systems will use the public ID to bind this document to its
DocBook-specific configuration.
But this is clearly not a DocBook document (at least to a human observer). But
an XML system that simply requires the document to be A) valid and B)
associated with a known external DTD will likely happily accept this document.
Thus, the DOCTYPE declaration tells you *nothing* actionable about the document
itself. It's completely valid (assuming I didn't introduce typos in the
internal declaration subset) but meaningless.
By having the grammar declared only by reference, i.e., RELAX NG, XSD, or some
other grammar, and by using namespaces to qualify at least one thing in the
document (as the DITA standard does with the @dita:DITAArchVersion attribute)
the document is unalterably associated with the definition of the thing it's
supposed to be (that is, the namespace name and the URIs of any associated
grammars function as names of the "true type" of the document, as opposed to
just pointers to syntactic rules that guide parsing and validation).
Compare with:
<?xml-model href="http://docbook.org/xml/5.1/rng/docbook.rng"
schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://docbook.org/xml/5.1/rng/docbook.rng"
type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<notabook>
<foo/>
</notebook>
This is clearly, and unambiguously, not a DocBook document. The model
references unalterably bind the document to governing schemas that will detect
the document's invalidity. The lack of the expected (and required) DocBook
namespace on the root element also exposes this as not being a DocBook document.
Likewise, there is no simple syntactic macro expansion happening here, so the
security exposure is lower.
So not a fan of DTDs.
From a pure validation convenience standpoint DTDs are still largely required
in many contexts because RELAXNG is not (yet) ubiquitous and it has only
recently gotten the infrastructure needed to support default attributes. But
that doesn't mean you should also use entities.
I would suggest that in contexts where DTDs are currently used that RELAX NG
would always be the better solution given appropriate infrastructure but that
infrastructure is only just now coming to fruition (the DITA community in
particular has pushed forward more complete implementation of the RELAX NG DTD
compatibility features, mostly through the efforts of George Bina, because DITA
depends on defaulted attributes and XSD 1.0 simply doesn't work for DITA (and
XSD 1.1 is not widely enough implemented)).
Cheers,
E.
--
Eliot Kimber
http://contrext.com
On 2/26/19, 3:00 PM, "Michele R Combs mrrothen(_at_)syr(_dot_)edu"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Well, I can't speak for the entire XML community but I can tell you that we
like it a lot. Our address, the URL to our main library catalog change, the
URL to our dept home page, etc. are subject to change on a regular basis.
Having them as entities referenced from our finding aids, rather than
hard-coded into each file, means that when there is a change we only have to
update one small XML snippet rather than 3000+ XML files.
Michele
-----Original Message-----
From: Eliot Kimber ekimber(_at_)contrext(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Tuesday, February 26, 2019 2:51 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Adding entity declarations to DOCTYPE in xml output
For the record, retaining external and internal text entities in XML was a
mistake. It's something I fought for at the time and now regret every time it
comes up.
The XML community has been wise in forgetting that text entities were ever
a feature.
Cheers,
Eliot
--
Eliot Kimber
http://contrext.com
On 2/26/19, 12:27 PM, "Michele R Combs mrrothen(_at_)syr(_dot_)edu"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Yeah, I was afraid that was the only way to do it :P Thanks --
Michele
-----Original Message-----
From: Michael Kay mike(_at_)saxonica(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Monday, February 25, 2019 4:35 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Adding entity declarations to DOCTYPE in xml output
If you're able to use Saxon, consider using the saxon:doctype extension
instruction.
It can't be done with any version of standard XSLT, except by
generating the DTD "by hand" using disable-output-escaping.
Michael Kay
Saxonica
> On 25 Feb 2019, at 21:15, Michele R Combs mrrothen(_at_)syr(_dot_)edu
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>
> Hello collective wisdom -
>
> I would like to have several entity declarations in my output XML.
Here's what I currently have in my XSL:
>
> <xsl:output
> method="xml"
> indent="yes"
> encoding="utf-8"
> exclude-result-prefixes="ns"
> omit-xml-declaration="yes"
> doctype-system="../ead_dtd/ead.dtd"
> doctype-public="+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded
Archival
> Description (EAD) Version 2002)//EN"/>
>
>
> The output XML looks like this:
>
> <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded
> Archival Description (EAD) Version 2002)//EN" "../ead_dtd/ead.dtd">
>
>
> I would like it to look like this:
>
> <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded
> Archival Description (EAD) Version 2002)//EN" "../ead_dtd/ead.dtd" [
> <!ENTITY sua_name SYSTEM "sua_name.txt"> <!ENTITY sua_address SYSTEM
> "sua_address.txt"> <!ENTITY subjindex SYSTEM "sua_index.txt">
<!ENTITY
> summitref SYSTEM "summit_ref.txt"> ]>
>
>
> Is this doable with XSL 1.1?
>
> Thanks --
>
> Michele
> +++++++++
> Michele Combs | Lead Archivist
> Special Collections Research Center
> Syracuse University Libraries
> 222 Waverly Ave
> Syracuse, New York 13244
> t 315.443-2081 | e
mrrothen(_at_)syr(_dot_)edu<mailto:mrrothen(_at_)syr(_dot_)edu> | w
> scrc.syr.edu SYRACUSE UNIVERSITY syr.edu <winmail.dat>
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--