Re: External parsed entities (Re: Inconsistency between IETF and W3C...)



MURATA Makoto wrote:


Chris Lilley wrote:

Yes, agreed.

We have two choices.  One is to use text/xml or application/xml even for
external parsed entities.  The other is to use application/xml-epe
only for those external parsed entities which are not XML documents.  I 
think
that the latter is a complicated rule.


The former also has complications, sinc eit means that application/xml
is "sometimes but nnot allways, well-formed xml". Since the terms valid
xml and well-formed xml are defined, but there is no defined term for
"stuf that is not wellformed", this is a problem. I think that this is
significant complication.


Well, I do not think this is complicated.  text/xml or application/xml
means either external parsed entities or document entities.  This is simple.


And you may or may not be able to send the result to an XML parser. This
is not simple.

Wheras for the latter option, it is simple. Is the epe itself a
well-formed document (this is easy to check mechanically). if yes, label
it as applicatio/xml. If no,label it as application/xml-epe (or whatever
term is chosen). This seems a simple, readily understod, and
machine-processable rule.


Suppose that you make an XML document which references to an external
parsed entity.  You are very likely to inform the URI of that document
to recipients but not that of the external parsed entity.  The external
parsed entity will thus be fetched only from XML processors during
parsing.  The fact that it is labelled as text/xml or application/xml
does not cause any problems.


Provided that it is not also referenced directly from anywhare - which,
if it is also a well-formed document in its won right, it might be.

I think "security through obscurity" is a poor plan when it would be
better justto have unambiguous labelling in the first place.

But the URI of the external parsed entity may become disclosed and some
program (e.g., WWW robots) may fetch it as a MIME entity.  This program
does not know if this MIME entity is an XML document or external parsed 
entity.
If it parses as XML, it is an XML document.


If it does not, then it is a fatal error. Hwever, according to your
proposal, it might still have been correctly labelled. According to
mine, it would have ben incorrectly labelled.

Its the principle of least surprise, really.

 Even if it does not, it may or may
not be an external parsed entity.  Is this a problem?


Yes, clearly. You describe a process whereby the MIME type told you no
useful information abnout the requested resource. That sounds like a
problem to me.

However, I have assumed that this issue is not very important since
we should anyway avoid external parsed entities at all in the Internet.


(Out of curioisity - why? In the context of HTTp/1.1 keep alive - its
not very expensive to fetch an epe once. If the epe is shared between
two or more documents, ther eis a net win even with HTTP/1.0)


Because different processors emit different outputs.  I personally think that
in the Internet, we should never use (1) default values declared in external
DTD subsets and external pararmeter entities, and (2) external parsed 
entities.


That is your choice if you don't want to use these features. But the
features are a legal part of the XML 1.0 spec and thus, any solution for
a MIME type or types from XML has to address all legal cases, not just
the ones you plan to use. I would have thought that was straightforward.
Or did you mean that, you do not plan to use themand you believe that
no-one else either uses or will use them? If that is the case, i can
readilly provide counter-examples.

Well, there is a move to define a category of "full infoset" parsers -
non validating, but which fetch epe's and external DTD subsets - which
deals with this problem.


I am not aware of such a move, and I have been a member of the XML Syntax WG.


Ask Tim Bray about it. He proposed the term, i believe.

I am aware of a move for so-called "trivial subset".  But I do not know
what will happen.


No, this is a move in the opposite direction. It recognises that the XML
spec defines a high ground (full validation) and a low ground (no
external DTD or entities fetched) but that in practice, there is a
valuable middle ground (no validation, but all external DTDs, external
parameter entities they refer to, and external parsed entities
referenced from the instance are fetched and used for such purposes as
attribute defaulting, declaration of ID, and suchlike. In practice, it
is this middle ground which is frequently that implemented by parsers
and which itwould be good to rely on for XML applications, yet there is
no defined name for this thing and thus no way to claim conformance to
it.

But this is not the forum for sucha topic, I apologise for the
digression.

Regardless, it is legal now to use epes, and thus, a rule needs tobe
established for labellingthem; and the rule needs to cover all legal
cases, not just some frequently occurring ones.


I think that the current rule satisfies there criteria.  The only
caveat is that (1) to know if an XML MIME entity is an XML document,
you have to parse it, and (2) even if an  XML MIME entity does not
parse as an XML document, you are not sure if it is an external
parsed entity or not.


I have difficulty accepting these caveats, when there are better options
available. The sniffing procedure you describe, for one thing, seems to
mandate recovery from a fatal error.

I do not think they are problems.  As for (1), you have to parse it anyway,
since the MIME header may be wrong.


That argument could be used, in the limit, to show that ther eis no need
for any MIME labelling at all. Its not a good direction to take.

As for (2), are they any requirements to
distinguish those text which may become external parsed entities, and those 
which
cannot?  To me, what does not parse as an XML document is useless.


But to others, perhaps not - if it parses as an epe and is being used as
such.

So, to be clear, I am suggesting

a) application/xml for xml files. All are required tobe well formed,as
per the XML specification, otherwise it is a fatal error.
b) application/xml-epe for external parsed entities which are not
themselves well-formed instances

By the way, if there is a strong reason for introducing a specialized
media type for external parsed entities, we also need another media
type for external *parameter* entities.


Yes. Which would then mean that -epe would be a bad choice of name ;-)
and another one should be sought. Perhaps -pars and -pram

--
Chris