xsl-list
[Top] [All Lists]

RE: [xsl] When are <!DOCTYPE> and svg namespace references material?

2010-02-03 13:40:19
Wow - this is extremely helpful to me and hopefully others on the list. Thank 
you very much! 

-----Original Message-----
From: C. M. Sperberg-McQueen [mailto:cmsmcq(_at_)blackmesatech(_dot_)com] 
Sent: Wednesday, February 03, 2010 1:02 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Cc: C. M. Sperberg-McQueen
Subject: Re: [xsl] When are <!DOCTYPE> and svg namespace references material?


On 3 Feb 2010, at 10:57 , Ylvisaker, Steve wrote:


There is a concern that our SVG graphics implementation may be 
introducing external reference dependencies outside our local network. 
An example graphic is:

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE svg PUBLIC 
"-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd
">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink
" x="0px" y="0px" width="357.553px" height="216.893px" viewBox="0 0
357.553 216.893" enable-background="new 0 0 357.553 216.893"  
xml:space="preserve">

Our graphics are, for the most part, generated by Adobe Illustrator
CS3 but we are running xslt transformations against them with Saxon 
and viewing the graphics with a variety of tools: Firefox, InkScape, 
Ai CS3, Antenna House formatter and Saxon-PE 9.2.0.2

I have isolated my work station (no corporate network or internet) and 
all of these applications work fine. But I don't know if they are 
trying to make an external reference, failing and driving on, or if 
the <!DOCTYPE> and W3 name space references are little more than 
documentation.

The only way to be certain would be to use some system utility which notices 
and reports attempts to open network ports.

The short answer is that none of the relevant specs themselves require without 
qualification that such network resources be read, but they also don't forbid 
it.

The longer answer has several parts.

(1) The presence of a DOCTYPE declaration does not, in principle, mean that the 
external DTD file must be dereferenced, though that is often the effect in 
practice.

The URI "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"; given as the system 
identifier for the DTD must be consulted by any processor performing DTD-based 
validation on the data.  The presence of a DOCTYPE declaration does not 
constitute an instruction to validate the document, and in principle it would 
be good if processors like Firefox allowed you to specify whether you want 
validation performed or not.  But in practice, many programs don't provide that 
kind of user control; instead they assume that if a DOCTYPE
declaration is present, they must or should validate the document.   
For such
programs, a request that they read a particular document amounts in effect to 
an instruction that they should validate it, too, if a DOCTYPE declaration is 
present.

Note that a program validating the document may or may not actually hit the
network:  the authoritative source for the document is the server identified, 
but if your system has a caching proxy and the DTD is in the cache, there will 
not necessarily be any network traffic.  And software built to work with 
documents of a particular kind may have and consult a locally cached copy of 
the DTD instead of retrieving it from the network.  In the case of DTDs served 
from W3C servers, the DTDs change very infrequently and the expiration dates 
are set to encourage local caching; experience on those servers shows that 
surprising numbers of programs and packages are willing to request the same 
resource thousands of times in the same minute, whether the requests succeed or 
fail.  When this happens frequently, it can place a bit of a strain on the 
server involved, so well behaved software should arrange for some kind of local 
cache.

See http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
for a more complete account of some relevant issues.

(2) Many programs will fail gracefully (or relatively gracefully) if they can't 
get to the DTD.

Many programs which attempt validation whenever they see a DOCTYPE declaration 
will shrug their shoulders and proceed without validation if they don't succeed 
in retrieving the required external resources (such as the DTD).
The logic of this behavior is not completely clear (if you think validation is 
required, why would you proceed anyway if you can't perform validation?), but 
it's not uncommon.

(3) Namespace names serve purposes of uniqueness and documentation.   
They will
seldom need to be dereferenced.

The URIs "http://www.w3.org/2000/svg"; and "http://www.w3.org/1999/xlink";
in your sample graphic identify certain constructs in the XML as being in the 
SVG or the XLink namespaces, respectively.  The crucial effect of this is to 
ensure that when the same local name is used in two different
namespaces, markup can reliably be assigned to one or to the other.   
There
is no need to dereference the namespace URI in order for software to perform 
that function.

Any software responsible for processing a particular vocabulary will need to 
know, given an element named (for example) "desc", whether it's the "desc"
element they know about (e.g. the SVG desc), or some other "desc"  
element
(any desc in any other namespace).  That also does not require that the URI be 
dereferenced; software built to process SVG, for example, will almost certainly 
have the SVG URI hard-coded into it somewhere.

On the other hand, namespace documents are occasionally used to provide links 
(e.g. via a RDDL document) to relevant resources, e.g. schema documents in 
various schema languages.  And so software may occasionally dereference a 
namespace URI to see if it can find relevant resources there.

And of course if a human is trying to understand what this SVG stuff is, then 
they might do worse than dereference the URI to see if it provides any useful 
human-readable information, or pointers to such information.  (The SVG and 
XLink URIs do in fact do this.)


Three of the applications, Firefox, InkScape and Adobe CS3 care about 
the name of the xmlns URL.

They should:  they include special code to process SVG, and that code should 
work on SVG elements and attributes but not on random markup in other 
namespaces.

Something other than www.w3.org trips them up. Antenna House and Saxon 
don't seem to care.

Saxon, not being an SVG processor, will almost certainly not care what 
namespace URI is used.  But if the namespace URI in the input document and the 
one in the stylesheet don't use, you are unlikely to be getting the 
transformation you had in mind.

I don't know why Antenna House behaves as it does.

With the <!DOCTYPE> declaration I can reference www.w3.org as above, 
or reference an internal network URL or drop the declaration all 
together and none of the applications perform differently. All of this 
is, of course, anecdotal data at best. It would be great to know for 
sure what is going on.

It sure would :)

My question: Is there ever an attempt to make an external reference to 
www.w3.org from either the <!DOCTYPE> declaration or the xmlns 
reference?

I hope the details above help a bit, even though the answer is a rather 
disappointing "it depends on the program".  Most XML specs work very hard to 
provide a declarative semantics for what they define, and the result is that 
conforming software has a fair bit of leeway as to what they do in particular 
cases.

If your organization is worried about things not working if the network goes 
down, I think your experiments show that that worry is not well founded.  I 
think you would be best advised not to try to strip out the references to 
external resources.

Michael Sperberg-McQueen

--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



[CONFIDENTIALITY AND PRIVACY NOTICE]

Information transmitted by this email is proprietary to Medtronic and is 
intended for use only by the individual or entity to which it is addressed, and 
may contain information that is private, privileged, confidential or exempt 
from disclosure under applicable law. If you are not the intended recipient or 
it appears that this mail has been forwarded to you without proper authority, 
you are notified that any use or dissemination of this information in any 
manner is strictly prohibited. In such cases, please delete this mail from your 
records.
 
To view this notice in other languages you can either select the following link 
or manually copy and paste the link into the address bar of a web browser: 
http://emaildisclaimer.medtronic.com


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--