My take on this discussion is that everyone is trying to squeeze a
multi-dimensional problem into a one dimensional regular expression
engine.
Why not approach this problem from another angle.
I see the data/metadata as a recursive data structure with attributes
attached to various internal and leaf nodes, describing regions of
text . To make life easier then the operations would be more akin
to set operations and array operations.
So why not use another notation to access the data of interest? Just
for arguments sake (and the only language that I vaguely know with
appropriate syntax/semantics) how about something PROLOGy? (Or something
more modern)
This approach may not require any CORE perl support it could be entirely
outside of perl core. But as an optimization perl could add recusive
data structures and the walking/matching algorithms.
(As hand waving. And from what I vaguely recall about PROLOG)
# All nodes that are Ordered lists. X will end up with the Nodes
# that represents the Ordered List.
[X] :- [ attribute{type} eq 'OrderList' ];
# All nodes that are Ordered Lists having exactly 3 items and one of
# the items encompasses has the requested text
# X will end up with Node that represents the Ordered List
# Y will contain the node that contains the text
[X, Y] :- [ {attribute{type} eq 'OrderList' && 3 == @{attribute{nodes}) }
, contents =~ /perl is terrific/ ];
How to actually make it return all of the nodes, how to actually
specify the syntax, I'll leave it to the list.
<chaim>
--
Chaim Frenkel Nonlinear Knowledge, Inc.
chaimf(_at_)pobox(_dot_)com
+1-718-236-0183