perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 15:44:07
John Macdonald writes:
Whether the attribute also applies to a piece cut out of the middle
can certainly depend upon the sort of attribute.  If you extract "ll"
out of <URL>http://perl.com/foo/ll</URL> it is certainly not
appropriate to retain the URL attribute.  XML has many attributes
that imply that the data has a specific structure.  So, for those, it
only makes sense to retain the attributes that have *both* boundaries
included.  But, for something like <b>, it makes more sense to retain
the attribute even if the data comes from the middle of the range -
that is an attribute that applies individually to each component -
although even there you'll often not want the attributes carried
along, depending upon your purpose in copying (e.g. if you copy a
filename from one place into a command to execute, you don't really
want to retain the bold attribute - but the out-of-band mechanism
will certainly make it not important if the attribute does get
copied, an in-band keeping of the attribute might be a nuisance).

Let us simplify the question then: break it into two operations.  One
extracts a substring with all attributes, the other one analyses
attributes before-the-start/after-the-end of substring to filter out
the "irrelevant" attributes.

Breaking this operation in two allows to keep the complexity of atoms
(almost) manageable.

In fact EText widget has a very similar facility implemented.  ;-)

|| > In contrast, working frame-based, I only need walk the attribute tree,
|| > find the attributes that apply to the given characters, and copy them.
|| > That's O(log N) or so -- certainly better than O(N).  More
|| > significantly, it requires *no* knowledge of metadata semantics.
|| 
|| Same for inline data.  There is absolutely no difference between
|| semantic of having metadata inline or separate.  We need more shallow
|| arguments than the semantic ones.

<HTML>  ... 100k bytes later ... <b>Hello</b>  ... </HTML>

Retaining enclosing inline attributes does require more effort,
unless you've built an out-of-line wrapping to collect its meaning.

I did discuss it already.  Each approach needs to implement some nasty
plots, but with the state of current discontent it is better to
discuss semantic before discussing the implementation.

Ilya