ietf
[Top] [All Lists]

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive(_at_)w3(_dot_)org from September 2012)

2012-10-24 10:48:11
On Tue, Oct 23, 2012 at 4:51 PM, Ian Hickson <ian(_at_)hixie(_dot_)ch> wrote:
On Wed, 24 Oct 2012, Christophe Lauret wrote:

As a Web developer who's had to write code multiple times to handle URIs
in very different contexts, I actually *like* the constraints in STD 66,
there are many instances where it is simpler to assume that the error
handling has been done prior and simply reject an invalid URI.

I think we can agree that the error handling should be, at the option of
the software developer, either to handle the input as defined by the
spec's algorithms, or to abort and not handle the input at all.

Yes, input is handled according to the specs' algorithmS.

But why not do it as a separate spec?

Having multiple specs means an implementor has to refer to multiple specs
to implement one algorithm, which is not a way to get interoperability.
Bugs creep in much faster when implementors have to switch between specs
just in the implementation of one algorithm.

One algorithm? There seem to be several functions...

- URI reference parsing (parse : scheme -> string -> raw uri_ref)
- URI reference normalization (normalize : raw uri_ref -> normal uri_ref)
- absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option)
- URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref)

Of course, some of these may be composed in any given implementation.
In the case of a/@href and img/@src, it appears to be something like
(one_algorithm = (resolve base_uri) . normalize . parse (scheme
base_uri)) is in use.

A good way to get interop is to thoroughly define each function and
supply implementors with test cases for each processing stage
(one_algorithm's test cases define some tests for parse, normalize,
and resolve as well).

Some systems use more than the simple function composition of web browsers...

Increasing the space of valid addresses, when the set of addressable
resources is not actually increasing only means more complex parsing rules.

I'm not saying we should increase the space of valid addresses.

Anne's current draft increases the space of valid addresses. This
isn't obvious as Anne's draft lacks a grammar and URI component
alphabets. You support Anne's draft and its philosophy, therefore you
are saying the space of valid addresses should be expanded.

Here is an example of a grammar extension that STD 66 disallows but
WHATWGRL allows:
<http://www.rfc-editor.org/errata_search.php?rfc=3986&eid=3330>

The de facto parsing rules are already complicated by de facto requirements 
for
handling errors, so defining those doesn't increase complexity either
(especially if such behaviour is left as optional, as discussed above.)

*parse* is separate from *normalize* is separate from checking if a
reference is absolute (*absp*) is separate from *resolve*.

Why don't we have a discussion about the functions and types involved
in URI processing?

Why don't we discuss expanding allowable alphabets and production rules?

David

<Prev in Thread] Current Thread [Next in Thread>