ietf
[Top] [All Lists]

Re: Gen-ART LC Review of draft-bryan-metalinkhttp-19

2011-02-15 11:26:37
hi Ben, thanks for taking the time to read & critique this :)
I apologize for it needing this exhaustive review.

the new version can be found at
http://tools.ietf.org/html/draft-bryan-metalinkhttp-20

with diff including your & others suggestions
http://tools.ietf.org/rfcdiff?url2=draft-bryan-metalinkhttp-20.txt

was there supposed to be a comment along with

-- section 2, 4th paragraph: "HTTP mirror servers SHOULD share the
same ETag policy as the originating Metalink server."
?

On Fri, Feb 11, 2011 at 6:46 PM, Ben Campbell <ben(_at_)nostrum(_dot_)com> 
wrote:
I am the assigned Gen-ART reviewer for this draft. For background on Gen-ART, 
please see the FAQ at 
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments you may 
receive.

Document: draft-bryan-metalinkhttp-19
Reviewer: Ben Campbell
Review Date: 2011-02-11
IETF LC End Date: 2011-02-11

Summary:

This document is on the right track for a proposed standard, but there are 
open issues and editorial issues that should be addressed prior to 
publication.

--- Major issues:

none.

--- Minor issues:

-- Section 1, paragraph 1, first sentence:

Is this really intended as an alternative to Metalink/XML? It seems like more 
of a complementary approach than an alternative one.

well...you could use either, or both :)

"Metalink/HTTP is an alternative and complementary representation of
Metalink information,"...
?

-- section 2, third paragraph:

If they must have the same eTag policy, should this document not specify at 
least one mandatory-to-implement policy?

SHOULD share the same ETag policy.

I added:

"It is up to the administrator of the Metalink server to communicate
the details of the shared ETag policy to the administrators of the
mirror servers so that the mirror servers can be configured with the
same ETag policy."

-- section 2, last paragraph: "Metalink clients MAY make use of digital 
signatures if they are offered."

That seems too weak. We usually recommend that people check signatures on 
downloads when available. Wouldn't we want an automated system to treat this 
as at least a SHOULD?

I think this was written with the idea that most download clients
won't have the capability to check signatures, even if we would like
them to.

"Metalink clients SHOULD support checking digital signatures."

-- section 3, 4th paragraph: "[[Some organizations have many mirrors. Only 
send a few mirrors, or only use the Link header fields if Want-Digest is 
used?]]"

Open issue?

removed...

-- section 3.3, 1st paragraph:"([draft-ietf-ftpext2-hash] allows for FTP 
mirrors to be coordinated and provide file hashes)."

Do you mean to allow such FTP servers to be preferred?

removed too.

-- section 4.1 "This is particularly useful for providing metadata such as 
cryptographic hashes of parts of a file, allowing a client to recover from 
partial errors (see Section 7.1.2)."

Seems like this might warrant a SHOULD.

"Metalink servers SHOULD provide Metalink/XML files with partial file
hashes in Link header fields and Metalink clients SHOULD use them for
error recovery."

-- section 5, last paragraph: "Metalink clients MAY support the use of 
OpenPGP signatures."

MAY seems weak here. This means that the publisher of a file has no 
reasonable expectation that clients will check the signature.

changed to SHOULD.

-- section 6, 3rd paragraph: "Digest: SHA-256=..."

Example needs more context

added:
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Link: &lt;http://www2.example.com/example.ext&gt;; rel=duplicate


-- section 7, 4th paragraph from end: "Downloads from mirrors that do not 
have the same file size as the Metalink server are considered unusable and 
the client can deal with it as it sees fit."

What if the client sees fit to use it? Is there a normative  "client SHOULD 
NOT" implied here?

changed to

"Metalink clients SHOULD NOT use downloads from mirrors that do not
have the same file size as the Metalink server. "

-- section 7.1.1, 1st paragraph:"the requirement is that merging of ranges 
from multiple responses must be verified"

The requirement for what? Is this intended to be normative?

changed

-- section 7.1.2, 2nd paragraph:"If no partial cryptographic hashes are 
available, then the client MUST fetch the complete object from other mirrors."

Does this imply that the client also MUST fetch from other mirrors if the 
hashed are present but fail verification?

no, changed.

-- section 8, last paragraph: "The size of chunks chosen by the client should 
be sufficiently large that the chunk request header fields and reponse header 
fields represent negligible overhead, and sufficiently large that they can be 
pipelined effectively without needing a very high rate of chunk requests."

Can you provide guidance on what "sufficiently large", and  "negligible 
overhead" mean, either in terms of examples or real numbers?

"Note that Range requests impose an overhead on servers and clients need to 
be aware of that and not abuse them."

What constitutes abuse? Can you provide more specific guidance on when range 
should not be used, since all the other guidance seems to suggest using it?

removed section 8 except for this first sentence & added the next

"Note that Range requests impose an overhead on servers and clients
need to be aware of that and not abuse them. Metalink clients SHOULD
NOT make more than one concurrent Range request to each mirror server
that it downloads from."

-- section 10.2:

What can an implementor do to mitigate spoofing?

nothing, I'm aware of?

added:
As with all downloads, users should only download from trusted sources.

-- section 10.4:

This section should include a warning that clients only have a MAY level 
requirement to support signature verification.

changed to SHOULD

"Metalinks should include digital signatures, as described in Section 5."

Normative?

yes, changed to SHOULD.

-- section 11.1, [BITTORRENT]

Is bittorrent.org an acceptable SDO for the purposes of normative references? 
(I'm not saying it's not--just asking)

it was for RFC 5854.
http://tools.ietf.org/html/rfc5854#section-8.1

[draft-ietf-ftpext2-hash]  -- is this a normative downref?

removed



--- Nits/editorial comments:

-- section 2, 4th paragraph: "HTTP mirror servers SHOULD share the same ETag 
policy as the originating Metalink server."

what's wrong here?

-- section 2, 5th paragraph: "Metalink clients use the mirrors provided by a 
Metalink server with Link header fields [RFC5988]."

Sentence seems ambiguous. Do you mean "_in_ Link header fields."

yes, changed.

-- section 3, 2nd paragraph: "A brief Metalink server response with two 
mirrors only:"

Sentence fragment

fixed

-- section 3, last paragraph: "fieldss"

Typo. (Note same typo repeats throughout)

fixed

"Such a decision could be a hard-coded limit, a random selection, based on 
file size, or based on server load."

I assume this is not meant to be exhaustive?

"As some organizations can have many mirrors, it is up to the
organization to configure the amount of Link header fields the
Metalink server will provide. Such a decision could be a random
selection or a hard-coded limit based on network proximity, file size,
server load, or other factors. "

-- section 3.3, last paragraph:

Paragraph is redundant with similar normative statements in section 2

removed from 3.3

-- section 4.1: "as shown in Section 4."

.. as shown in the example in section 4.

fixed

-- section 6, 1st paragraph: "Metalink servers MUST provide Instance Digests 
in HTTP [RFC3230] for files they describe with mirrors via Link header 
fields."

Redundant with previously mentioned normative requirement.

removed

-- section 7, paragraph starting with: "If no range limit was given in the 
original request then work from the tail of the object (the first request is 
still running and will eventually catch up), otherwise continue after the 
range requested in the first request."

Implied "you". Restate as "the client works from the tail" (Please avoid use 
of 2nd person pronouns in an RFC, implied or otherwise) (Note that this usage 
is sprinkled throughout )

"If no Range was provided,"

... in the original request,...

Also, be careful with the passive voice in this paragraph. Who undertakes 
HEAD requests first? Who undertakes Range-based requests?

rewritten

   If the first request was GET and no Range header field was sent and
   the client determines later that it will issue a Range request, then
   the client SHOULD close the first connection when it catches up with
   the other parallel ranged downloads of the same object.  This means
   the first connection was sacrificed.  Metalink clients can use a HEAD
   request first, if possible, so that the client can find out if there
   are any Link header fields, and then Range-based requests are
   undertaken to the mirror servers without sacrificing a first
   connection.

-- section 7, paragraph 10: "If-Match conditions based on the ETag SHOULD be 
used..."

Clients should use If-Match conditions...

"If no indication of ETag syncronisation/ knowledge is given then If-Match 
should not be used, and optimally there will be an Instance Digest in the 
mirror response which we can use to detect a mismatch early, and if not then 
a mismatch won’t be detected until the completed object is verified."

Is there any way other than the pref parameter to indicate this? If not, it 
would be better to say "if no pref parameter is present..."

Is this sentence intended to be normative?

Also, "... The client should not use..."

Finally, the sentence is hard to follow--please break it up.

changed:

   Preferred mirrors have coordinated ETags, as described in
   Section 3.3, and Metalink clients SHOULD use If-Match conditions
   based on the ETag to quickly detect out-of-date mirrors by using the
   ETag from the Metalink server response.  Optimally, the mirror server
   will include an Instance Digest in the mirror response to the client
   GET request, which the client can also use to detect a mismatch
   early.  If the mirror did not include the pref parameter or an
   Instance Digest, then a mismatch can not be detected until the
   completed object is verified.

-- section 7.1.1, general:

This whole section switches to an almost stream-of-consciousness style. It 
has very long sentences with many comma separated clauses that are hard to 
follow. It really needs to be rewritten to be more readable and precise.

hahaha :)

this is my fault as a bad editor & a non-native English speaker co-author.

-- section 7.1.1, paragraph 3:

Paragraph seems self contradictory. How is guaranteeing correct content 
different than verifying integrity?

the server sends the correct content  but there can be errors during transfer.

maybe this is not explicit enough. might need to condense some of this down.

earlier we state:
Error detection requires Instance Digests to detect errors in transfer
after the transfers have completed.

-- paragraph 4: "This guarantees that a mismatch will be detected by using 
only the synchronized ETag from a master server and mirror server, even 
alerted by the mirror servers themselves by responding with an error, 
preventing accidental merges of ranges from different versions of files with 
the same name."

I can't parse this sentence.

 This guarantees that a mismatch will be detected by using only the
   shared ETag from a Metalink server and mirror server.  Mirror servers
   will respond with an error if ETags do not match, which will prevent
   accidental merges of ranges from different versions of files with the
   same name.

"This even includes many malicious attacks where the data on the mirror has 
been replaced by some other file, but not all."

Antecedent of "this" is ambiguous

removed

-- 2nd to last paragraph:

This is the first mention of e idea of slave and master mirror servers. Is 
this the same as preferred and normal mirrors? Please use consistent terms.

ok.

-- section 7.1.2, 2nd paragraph: "If the object cryptographic hash does not 
match the Instance Digest then fetch the Metalink/XML if available, where 
partial file cryptographic hashes can be found, allowing detection of which 
server returned incorrect data."

I can't parse this sentence.

   If the cryptographic hash of the object does not match the Instance
   Digest from the Metalink server, then the client SHOULD fetch the
   Metalink/XML (if available) that could contain partial file
   cryptographic hashes which will allow detection of which mirror
   server returned incorrect data.  Metalink clients SHOULD figure out
   what ranges of the downloaded data can be recovered and what needs to
   be fetched again.

-- section 8, paragraph after bullet list:

Please avoid first person pronouns in an RFC.

"Based on laboratory experiments,..."

Reference?

removed text

"we suggest a good default number of simultaneous connections is probably 
four,"

That's very weak language--can you remove "we suggest" and "probably"?

removed

-- Section 9, 1st paragraph: "registration to the Link Relation Type registry"

Can you provide the URL to the registry?

added URL to registry

-- section 10.2 "In that case, this could deceive unaware downloaders that 
they are downloading a malicious or worthless file."

Sentence does not make sense. Does the attacker with to deceive them into 
downloading a malicious file, or convince them that they are when they are 
not?

"In that case, this could deceive unaware downloaders into downloading
a malicious or worthless file."


-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf