John Macdonald writes:
Whether the attribute also applies to a piece cut out of the middle
can certainly depend upon the sort of attribute. If you extract "ll"
out of <URL>http://perl.com/foo/ll</URL> it is certainly not
appropriate to retain the URL attribute. XML has many attributes
that imply that the data has a specific structure. So, for those, it
only makes sense to retain the attributes that have *both* boundaries
included. But, for something like <b>, it makes more sense to retain
the attribute even if the data comes from the middle of the range -
that is an attribute that applies individually to each component -
although even there you'll often not want the attributes carried
along, depending upon your purpose in copying (e.g. if you copy a
filename from one place into a command to execute, you don't really
want to retain the bold attribute - but the out-of-band mechanism
will certainly make it not important if the attribute does get
copied, an in-band keeping of the attribute might be a nuisance).
Let us simplify the question then: break it into two operations. One
extracts a substring with all attributes, the other one analyses
attributes before-the-start/after-the-end of substring to filter out
the "irrelevant" attributes.
Breaking this operation in two allows to keep the complexity of atoms
(almost) manageable.
In fact EText widget has a very similar facility implemented. ;-)
|| > In contrast, working frame-based, I only need walk the attribute tree,
|| > find the attributes that apply to the given characters, and copy them.
|| > That's O(log N) or so -- certainly better than O(N). More
|| > significantly, it requires *no* knowledge of metadata semantics.
||
|| Same for inline data. There is absolutely no difference between
|| semantic of having metadata inline or separate. We need more shallow
|| arguments than the semantic ones.
<HTML> ... 100k bytes later ... <b>Hello</b> ... </HTML>
Retaining enclosing inline attributes does require more effort,
unless you've built an out-of-line wrapping to collect its meaning.
I did discuss it already. Each approach needs to implement some nasty
plots, but with the state of current discontent it is better to
discuss semantic before discussing the implementation.
Ilya