Jeff,
Converting the data to xhtml is an excellent plan, but there is still
value in being able to leave the raw data essentially untouched at
least while you are testing the conversions. (Bad HTML can really
mess up quite badly during such a transform ... ).
A temporary solution is to first rewrite each html file as
<?xml version="1.0">
<questionable_html_fragment><![CDATA[
original content goes here ...
]]></questionable_html_fragment>
You can then use the document function to read these files in. (You may
have to watch out for embedded "]]>" 's in the files)
Stan Devitt
StratumTek
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list