On 19 November 2010 13:31, Graydon <graydon(_at_)marost(_dot_)ca> wrote:
On Fri, Nov 19, 2010 at 11:38:40AM +0100, Wolfgang Laun scripsit:
So the initially posted loop will have to be extended to iterate over
90,000 files...?
Yes.
Then even Michael's O(n.log(n)) might be beyond the tolerance limit.
Repeating myself: I'd do a single pass XSLT extraction of links and
targets, followed by grep, sort -u and comm, and spend the saved time
surfing ;-)
Adding another idea: Preocessing single-file results after each
individual file's processing will leave you with the remainder that's
either broken or must be matched cross-file-wise, which might help if
most links are file-local.
-W
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--