ietf
[Top] [All Lists]

Re: Last Call: An IETF URN Sub-namespace for Registered Protocol Parameters to BCP

2002-07-03 14:36:03
At 10:57 AM 7/3/02 -0400, Keith Moore wrote:
> Spurred by XML and related technologies (which I assert are far more than
> mere "fashion") we are seeing URIs used for a wide range of purposes which
> are not constrained by a requirement for dereferencing.   The use of URIs
> for identifying arbitrary things is now a fact of life, and in some
> technical domains is providing to be extremely useful.  You claim "harm",
> but I recognize no such harm.

Clarification: I claim "harm" for the proposed use of *URNs* because
URNs were designed to be long-term stable names for (at least potentially)
network-accessible resources, whereas the proposal is to use them as a
way of generating globally unique strings like UUIDs or OIDs.

I still don't see the "harm" here.

Another way to look at this might be: they all have potentially network-retrievable representations, but not all uses depend on being able to perform the retrieval.

> Having different syntactic contexts in which names are used will inevitably
> lead to different syntactic name forms.  I submit that the real challenge
> here is not to prevent the use of varying syntax, but to lock the various
> syntactic forms to a common semantic definition

Oddly enough, having different syntactic contexts also tends to cause
differences in semantic definition.  In one syntactic context order
of elements can be significant whereas it's not in the other. one syntactic
context is designed to allow individual components to be accessed independently
of the others while another expects the entire resource description to
be available to the consumer.  One makes it easy to group related
items; another doesn't have a way of representing relationships between
items.  The semantic definitions tend to be influenced by these factors.

I'm all for reuse of data models where it makes sense, but if the goal
is really to "lock the various syntactic forms to a common semantic
definition" (presumably one which is compatible with XML) then I take
strong issue with that, as the XML model is quite dysfunctional for
many purposes.  (as are the others, it's just that XML is the current
bandwagon)

I'm puzzled -- you appear to be arguing my point. Yes, different syntactic frameworks will (in isolation) tend to yields differing semantics. Yes, different syntactic frameworks are better suited for different purposes. But it seems to me that referring different uses to the same original definition would help to inhibit that -- and if factors like ordering or grouping are significant, then the definition will (hopefully) capture that and place constraints on the syntactic contexts for re-use.

> -- in this case, providing
> a way to create syntactic URI forms that can be bound to protocol semantics
> in a way that inhibits semantic drift between the different forms.

But such drift is almost inevitable.  You can't recast some existing
data structure in XML and use it widely and expect the meanings of the
protocol elements to stay the same.  And in essentially every example I've
seen of an attempt to do this, the meanings of the protocol elements are
changed subtly from the very beginning, usually by trying to use XML
structure to represent relationships that aren't explicit in the original
data model.  More generally, an XML representation of a data model will get
used differently than the original representation, and the semantics of the
individual protocol elements will almost certainly drift as a result.

(Actually this happens even when you use the same representation.
RFC 822 headers had subtly different meanings on BITNET than on
the Internet, because there were enough differences in the two user
communities and the mail reading programs used by those communities.
Similarly, casting a data model into XML means that a different set
of tools will be used to access/manipulate that data - indeed that
is the entire point of doing so - but this *will* cause semantic drift
in the data model between the two environments)

Using URIs for the names of the data elements won't stop that kind of drift.

But not trying to re-use existing definitions seems to be a recipe for Balkanization.

Maybe it won't work for all applications, but I think there are a substantial number of cases where re-use of existing definitions is a reasonable and desirable goal. I have two ongoing projects for which I would really like to see this URN namespace proposal approved:

(a) Distributed storage and analysis of email and other message metadata.

(b) common feature descriptions for IETF/W3C content negotiation efforts.

> One of the motivating factors in this work (for me, at least, and I think
> for others) has been to draw together some of the divergent strands of
> thinking that are taking place in the IETF and W3C.  W3C are fundamentally
> set on a course of using URIs as a generic space of identifiers. IETF have
> a number of well-established protocols that use registries to allocate
> names. Neither of these are going to change in the foreseeable future. So > do we accept a Balkanization of Internet standards efforts, or do we try to
> draw them together?

Some things don't mix very well, even if they are quite useful individually.
The traditional examples are oil and water.

That seems like a non-argument for opposing this proposal. Even emulsions have their uses.


> A particular case in point is content negotiation.  The IETF have prepared
> a specification for describing media features that uses a traditional form
> of IANA registry to bind names to features.  In parallel with this, W3C
> have prepared a specification which has some similar goals, but which uses
> URIs to represent media features, and relies on the normal URI allocation
> framework to ensure the minting of unique names as and when needed.  (I
> have some reservations about this, but that can't change what is actually
> happening.)

But neither do we have to endorse it just so they will use our stuff.
Especially when their using our stuff dilutes the utility of our stuff
by not requiring widespread agreement on the media features used.

Come again? That seems to me to be entirely non-sequitur. How can other people using out stuff dilute its utility? It is precisely in the nature of this proposal that using these URIs would be assenting to the IETF definition of their meaning.


> This URN namespace proposal will provide a way to incorporate
> the IETF feature registry directly into the W3C work, in a way which is
> traceable through IETF specifications.   Without this, I predict that the
> parties who are looking to use the W3C work (notably, mobile phone
> companies) will simply go away and invent their own set of media features,
> without any kind of clear relationship to the IETF features.

The w3c approach is encouraging them to do this anyway, by having
all media features be URIs that anyone can create/assign without any
agreement from anyone else.

So we should roll over and play dead, and pretend that interoperability doesn't matter?

Actually, that's a misrepresentation of the W3C position, which is that vocabularies gain currency through use -- the more people who use them, the more useful, and more widely used they become. (Sure, that's a generalization.) This approach seems to be very much in the spirit of the IETF I've been participating in over the past few years -- it's not our role to decide what will and will not work, but to provide an environment in which new technologies can evolve and find currency, and promote interoperability wherever we can.


> In summary: URIs *will* be used to identify protocol parameters. The IETF
> cannot prevent that.  What the IETF can do by supporting a particular form
> of such use is to try and ensure that such use remains bound by a clear,
> authoritative chain of specifications to the IETF specification of what
> such parameters mean. The harm that comes from not doing this, in my view,
> is that we end up with a multiplicity of URIs that mean nearly, but not
> quite, the same thing as an IETF protocol parameter.  That outcome, I
> submit, cannot be good for longer term interoperability between IETF and
> other organizations' specifications.

The likely consequence of what is being proposed is for the URIs that we
define to mean nearly, but not quite, the same thing as an IETF protocol
parameter - but we have to try to pretend that they mean the same thing.
And it will degrade interoperability.

Er, no: we *define* them to mean the *same* thing. If implementations play fast and loose with the defined meaning, that's nothing new.


> >d) embed NO visible structure in the URNs - just assign each
> >    parameter value a sequence number.  people who want to use
> >    those URNs in XML or whatever would need to look them up at IANA's
> >    web site.
>
> I disagree. This requirement actively works against one of the motivations
> for using URIs in application data formats;  that there be a scalable
> framework for different organizations and persons to mint their own
> identifiers.

The fact that people want to use URIs in this way does not mean that it's
appropriate to use URNs in this way.  If people want to mint their own URNs,
then they have to follow the rules for URNs.  Those rules *do not*
permit arbitrary organizations and persons to mint their own identifiers
without explicit delegation from a URN namespace, for very good reasons
which are consistent with URNs' purposes.

Ah, that's a misunderstanding. One of the reasons I favour using URNs in this way (and contrary to the often touted W3C position) is that it provides a form of URI that is clearly *not* minted by any Tom, Dick or Harry working in isolation. The definition of any urn:ietf:... URI is subject to the IETF consensus process, so can be expected to have been involved in some level of community review. My point here was that, because they conform to a common URI syntactic framework, they can be used interchangeably in some contexts with experimental and private-use identifiers. (In a sense, this might be viewed as a converse of the X-header approach: arbitrary URIs may be treated as experimental or private use, unless they are allocated within a URI namespace controlled by a recognized authority in the area of their application.


The very temptation to treat URNs as if they were as malleable as other
URIs is part of what makes this proposal dangerous.  Since I think that
URNs *will* be widely misused if they are used for protocol elements,
I'd far rather have IANA assign ordinary URIs for this - then we will
still get semantic drift but at least it won't dilute the value of URNs.

In what sense are URNs not ordinary URIs? They have particular requirements for persistence that are not shared by all URI schemes. And there is a requirement for "location independence", but what that means isn't always clear.

But mainly, the goal of this proposals is emphatically *not* to make URNs "malleable" (in the sense of, say, http: URIs which can be reassigned at will by domain owners), but to allow the introduction of some URIs that can clearly be seen to be stable and persistent.

I'd be happy for IANA to assign "ordinary URIs", assuming that by this you mean something like http://www.ietf.org/..., as long as there was a clear organizational commitment that such a URI, once allocated, would never be reallocated for any other purpose. It's the particular properties of URNs that are desired here, not any sense that they are somehow a "special" form of URIs.


> To use an identifier, one must:
>
> (i) have a framework for assigning identifier values, in such a way that it
> is possible by some means for a human to locate its defining
> specification.  I can't see how to do this without exploiting a visible
> syntactic structure in the name.

ISBNs do not have a visible syntactic structure, at least, not an
obvious one.  But they're quite frequently used to look up book information.

I understand that ISBNs aren't persistent -- they get reused. How many books are "in print" at any time? I don't think this is quite Internet scale.

Anyway, ISBN's *do* have an internal syntactic structure. From http://www.isbn.org/standards/home/isbn/us/isbnqa.asp#Q4:

[[
Does the ISBN have any meaning imbedded in the numbers?

The four parts of an ISBN are as follows:
Group or country identifier which identifies a national or geographic grouping of publishers;
Publisher identifier which identifies a particular publisher within a group;
Title identifier which identifies a particular title or edition of a title;
Check digit is the single digit at the end of the ISBN which validates the ISBN.
]]


> (ii) have a framework for actually using the identifier in an
> application: in this case, I agree that the identifier should generally be
> treated as opaque.
>
> Also, I think (d) contradicts your goal (a):  I cannot conceive any
> scalable resolution mechanism that does not in some sense depend on
> syntactic decomposition of the name.

You should really read up on the CNRI handle system then.  There are a lot
of things I don't like about it but it really was designed to have exactly
this property.

Based on a December 2001 article (http://www.dlib.org/dlib/december01/blanchi/12blanchi.html), it seems to me that Handles too depend on some syntactic structure to partition the search space -- based on dynamic content types and metadata schema. (I should be clear that I'm using the term syntactic structure in an abstract sense, a la McCarthy (http://www-formal.stanford.edu/jmc/towards/node12.html#SECTION000120000000000000000), rather than in the sense of a specific arrangement of characters.)

Ah yes, and according to the internet draft on handles:
  http://www.ietf.org/internet-drafts/draft-sun-handle-system-09.txt
there *is* a clear syntactic structure:
[[
 2. Handle Namespace

    Every handle consists of two parts: its naming authority, otherwise
    known as its prefix, and a unique local name under the naming
    authority, otherwise known as its suffix. The naming authority and
    local name are separated by the ASCII character "/". A handle may
    thus be defined as:

      <Handle> ::= <Handle Naming Authority> "/" <Handle Local Name>
 ]]
How each naming authority deals with scaling within its domain of authority doesn't seem to be specified.

(Actually, when I wrote the above, I later realized that I misspoke slightly, because some systems work in constrained contexts -- I was referring to systems operating at global Internet scale without further contextualization. But I think the general idea still holds here -- if you want to reliably and quickly dereference an identifier with Internet scope, it cannot be completely opaque.)

#g


-------------------
Graham Klyne
<GK(_at_)NineByNine(_dot_)org>