Andrew Welch wrote:
This means there are effectively two copies of the same information. We
currently use special delimiters to tell our search highlighting code
not to include the first table's header in the highlighting regex, but I
would like to use the markup instead. Can Lucene handle this (I will
check)?
I index configuration, metadata xml and content pieces with lucene. When
doing the search I get the hits and take field values and form that into
XML to be styled by XSL (resulting in something similar to what you
have, I think). You can even index your XSL offline for your own use :).
Anyway, using this:
QueryHighlightExtractor qhe = new QueryHighlightExtractor(query,
new StandardAnalyzer(),
"<span class='highlight'>",
"</span>");
will do what you want.
A while ago I did searches with XSL and found it to be extremely
resource intensive, in our case. Indexing with Lucene is extremely fast
and searching that index is extremely fast and light on resources. There
are many types of searches that are ridiculously hard with XSL (i.e.
fuzzy queries). Check out the contributions and sandbox on the lucene
site - you can probably have something up and running within a day.
best,
-Rob