On Wed, 24 Oct 2012, Manger, James H wrote:
Currently, I don't think url.spec.whatwg.org distinguishes between
strings that are valid URLs and strings that can be interpreted as URLs
by applying its standardised error handling. Consequently, error
handling cannot be at the option of the software developer as you cannot
tell which bits are error handling.
Well first, the whole point of discussions like this is to work out what
the specs _should_ say; if the specs were perfect then there wouldn't be
any need for discussion.
But second, I believe it's already Anne's intention to add to the parsing
algorithm the ability to abort whenever the URL isn't conforming, he just
hasn't done that yet because he hasn't specced what's conforming in the
first place.
On Tue, 23 Oct 2012, David Sheets wrote:
One algorithm? There seem to be several functions...
- URI reference parsing (parse : scheme -> string -> raw uri_ref)
- URI reference normalization (normalize : raw uri_ref -> normal uri_ref)
- absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option)
- URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref)
I don't understand what your four algorithms are supposed to be.
There's just one algorithm as far as I can tell -- it takes as input an
arbitrary string and a base URL object, and returns a normalised absolute
URL object, where a "URL object" is a conceptual construct consisting of
the components scheme, userinfo, host, port, path, query, and
fragment, which can be serialised together into a string form.
(I guess you could count the serialiser as a second algorithm, in which
case there's two.)
Anne's current draft increases the space of valid addresses.
No, Anne hasn't finished defining conformance yet. (He just started
today.)
You may be getting confused by the "invalid flag", which doesn't mean the
input is non-conforming, but means that the input is uninterpretable.
The de facto parsing rules are already complicated by de facto
requirements for handling errors, so defining those doesn't increase
complexity either (especially if such behaviour is left as optional,
as discussed above.)
*parse* is separate from *normalize* is separate from checking if a
reference is absolute (*absp*) is separate from *resolve*.
No, it doesn't have to be. That's actually a more complicated way of
looking at it than necessary, IMHO.
Why don't we have a discussion about the functions and types involved in
URI processing?
Why don't we discuss expanding allowable alphabets and production rules?
Personally I think this kind of open-ended approach is not a good way to
write specs. Better is to put forward concrete use cases, technical data,
etc, and let the spec editor take all that into account and turn it into a
standard. Arguing about what precise alphabets are allowed and whether to
spec something using prose or production rules is just bikeshedding.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'