ietf
[Top] [All Lists]

Re: Last Call: An IETF URN Sub-namespace for Registered Protocol Parameters to BCP

2002-07-03 11:18:45
Keith,

It seems that your objections to this proposal are based on a very 
different view of what constitutes a "resource" to that which is understood 
in circles where URIs are commonly used.  Some edge-cases may have been a 
matter for debate, but a good working approximation is "anything that can 
be identified by a URI".

In other words, anything you can attach a name to is a resource.
 
Spurred by XML and related technologies (which I assert are far more than 
mere "fashion") we are seeing URIs used for a wide range of purposes which 
are not constrained by a requirement for dereferencing.   The use of URIs 
for identifying arbitrary things is now a fact of life, and in some 
technical domains is providing to be extremely useful.  You claim "harm", 
but I recognize no such harm.

Clarification: I claim "harm" for the proposed use of *URNs* because
URNs were designed to be long-term stable names for (at least potentially)
network-accessible resources, whereas the proposal is to use them as a 
way of generating globally unique strings like UUIDs or OIDs.

Having different syntactic contexts in which names are used will inevitably 
lead to different syntactic name forms.  I submit that the real challenge 
here is not to prevent the use of varying syntax, but to lock the various 
syntactic forms to a common semantic definition 

Oddly enough, having different syntactic contexts also tends to cause
differences in semantic definition.  In one syntactic context order
of elements can be significant whereas it's not in the other. one syntactic 
context is designed to allow individual components to be accessed independently
of the others while another expects the entire resource description to
be available to the consumer.  One makes it easy to group related
items; another doesn't have a way of representing relationships between
items.  The semantic definitions tend to be influenced by these factors.

I'm all for reuse of data models where it makes sense, but if the goal
is really to "lock the various syntactic forms to a common semantic
definition" (presumably one which is compatible with XML) then I take
strong issue with that, as the XML model is quite dysfunctional for
many purposes.  (as are the others, it's just that XML is the current
bandwagon)

-- in this case, providing 
a way to create syntactic URI forms that can be bound to protocol semantics 
in a way that inhibits semantic drift between the different forms.

But such drift is almost inevitable.  You can't recast some existing
data structure in XML and use it widely and expect the meanings of the 
protocol elements to stay the same.  And in essentially every example I've 
seen of an attempt to do this, the meanings of the protocol elements are 
changed subtly from the very beginning, usually by trying to use XML
structure to represent relationships that aren't explicit in the original  
data model.  More generally, an XML representation of a data model will get 
used differently than the original representation, and the semantics of the 
individual protocol elements will almost certainly drift as a result.

(Actually this happens even when you use the same representation.
RFC 822 headers had subtly different meanings on BITNET than on
the Internet, because there were enough differences in the two user 
communities and the mail reading programs used by those communities.
Similarly, casting a data model into XML means that a different set
of tools will be used to access/manipulate that data - indeed that
is the entire point of doing so - but this *will* cause semantic drift
in the data model between the two environments)

Using URIs for the names of the data elements won't stop that kind of drift.

One of the motivating factors in this work (for me, at least, and I think 
for others) has been to draw together some of the divergent strands of 
thinking that are taking place in the IETF and W3C.  W3C are fundamentally 
set on a course of using URIs as a generic space of identifiers.  IETF have 
a number of well-established protocols that use registries to allocate 
names.  Neither of these are going to change in the foreseeable future.  So 
do we accept a Balkanization of Internet standards efforts, or do we try to 
draw them together?

Some things don't mix very well, even if they are quite useful individually.
The traditional examples are oil and water.

A particular case in point is content negotiation.  The IETF have prepared 
a specification for describing media features that uses a traditional form 
of IANA registry to bind names to features.  In parallel with this, W3C 
have prepared a specification which has some similar goals, but which uses 
URIs to represent media features, and relies on the normal URI allocation 
framework to ensure the minting of unique names as and when needed.  (I 
have some reservations about this, but that can't change what is actually 
happening.)  

But neither do we have to endorse it just so they will use our stuff.
Especially when their using our stuff dilutes the utility of our stuff
by not requiring widespread agreement on the media features used.

This URN namespace proposal will provide a way to incorporate 
the IETF feature registry directly into the W3C work, in a way which is 
traceable through IETF specifications.   Without this, I predict that the 
parties who are looking to use the W3C work (notably, mobile phone 
companies) will simply go away and invent their own set of media features, 
without any kind of clear relationship to the IETF features.  

The w3c approach is encouraging them to do this anyway, by having
all media features be URIs that anyone can create/assign without any 
agreement from anyone else.

I also observe that IETF and W3C operate against somewhat differing 
background assumptions:  the IETF focus on wire protocols means that the 
context in which a PDU is processed is well-understood, pretty much by 
definition of the protocol.  We have protocol rendezvous mechanisms and 
state-machines and synchronization techniques that reduce the amount of 
explicit information that is needed to be exchanged between parties -- this 
is all part of efficient protocol design.  The work of W3C (and other 
designers working "over the stack") often depends on obviating such 
contextual assumptions, and in such cases the global (context-free) 
qualities of URIs are extremely valuable.  If these layers were truly 
isolated from each other, this debate would probably never arise.  But 
there is genuine leakage:  client preferences depend on underlying hardware 
capabilities;  trust decisions may incorporate protocol addressing and 
other information, etc., etc.  This proposal to allow IETF protocol 
parameter identifiers to be embedded in URI space is one way of controlling 
information in these cross-layer interactions.

I think it would be far more useful to think of things in terms of the
mapping/translation process rather than just assigning alternate names to 
the protocol elements.  If it happens that the translation doesn't affect 
semantics of individual elements at all, that's a good thing, but my
experience suggests that that it often will.  And you don't know for sure
until you start looking at how those protocol elements will actually be
used in the new environment.
 
Another different assumption between wire-protocols and application data 
formats:  protocols are very binary -- either one is using a particular 
protocol or one is not.  The years-long Internet Fax debates about adapting 
email for real-time image transmission made that very clear.  It is not 
permissable to simply assume that a communicating party understands 
anything beyond the standardized protocol elements.  And there is a very 
clear distinction in protocol specifications between what is standardized 
and what is private extension.  This distinction is not so clear in 
application data formats, and while there may be a core of standardized 
data elements, it is often desirable for communities of users (or 
application designers) to agree some common extensions -- this is typical 
of how XML application formats are deployed.  Using URIs as identifiers 
(e.g. in the case of XML, as namespace identifiers) allows for more 
flexible deployment of formats, avoiding the problems of "X-headers" that 
have for so long been a bane of IETF application standardization/extension 
efforts.

Actually they are X- headers, just globally unique ones.   I'll freely
admit that such extensibility can be useful, and that having distributed
assignment of globally unique names for extension fields is a good idea
(though I've rarely seen a conflict between X- headers) but it's a huge
stretch to say that all fields should be defined this way.

(Also, I don't think that X- headers are a "bane" or ever have been;
they seem to cause far less harm than improper use of non X- fields; 
they're just currently out of fashion for reasons I cannot fathom)

In summary:  URIs *will* be used to identify protocol parameters.  The IETF 
cannot prevent that.  What the IETF can do by supporting a particular form 
of such use is to try and ensure that such use remains bound by a clear, 
authoritative chain of specifications to the IETF specification of what 
such parameters mean.  The harm that comes from not doing this, in my view, 
is that we end up with a multiplicity of URIs that mean nearly, but not 
quite, the same thing as an IETF protocol parameter.  That outcome, I 
submit, cannot be good for longer term interoperability between IETF and 
other organizations' specifications.

The likely consequence of what is being proposed is for the URIs that we
define to mean nearly, but not quite, the same thing as an IETF protocol
parameter - but we have to try to pretend that they mean the same thing. 
And it will degrade interoperability.

 
d) embed NO visible structure in the URNs - just assign each
   parameter value a sequence number.  people who want to use
   those URNs in XML or whatever would need to look them up at IANA's
   web site.

I disagree.  This requirement actively works against one of the motivations 
for using URIs in application data formats;  that there be a scalable 
framework for different organizations and persons to mint their own 
identifiers.

The fact that people want to use URIs in this way does not mean that it's
appropriate to use URNs in this way.  If people want to mint their own URNs,
then they have to follow the rules for URNs.  Those rules *do not* 
permit arbitrary organizations and persons to mint their own identifiers
without explicit delegation from a URN namespace, for very good reasons
which are consistent with URNs' purposes.

The very temptation to treat URNs as if they were as malleable as other 
URIs is part of what makes this proposal dangerous.  Since I think that
URNs *will* be widely misused if they are used for protocol elements, 
I'd far rather have IANA assign ordinary URIs for this - then we will
still get semantic drift but at least it won't dilute the value of URNs.

To use an identifier, one must:

(i) have a framework for assigning identifier values, in such a way that it 
is possible by some means for a human to locate its defining 
specification.  I can't see how to do this without exploiting a visible 
syntactic structure in the name.

ISBNs do not have a visible syntactic structure, at least, not an 
obvious one.  But they're quite frequently used to look up book information.
 
(ii) have a framework for actually using the identifier in an 
application:  in this case, I agree that the identifier should generally be 
treated as opaque.

Also, I think (d) contradicts your goal (a):  I cannot conceive any 
scalable resolution mechanism that does not in some sense depend on 
syntactic decomposition of the name.

You should really read up on the CNRI handle system then.  There are a lot
of things I don't like about it but it really was designed to have exactly
this property.

Keith