Re: In-Band Information Considered Harmful

[ Apologies in advance for extensive quoting.  It seems necesary. ]

According to Ilya Zakharevich:

Chip Salzenberg writes:

Ilya:

Chip:

Ilya:

      a) mark char or a sequence of chars (e.g. Bold)


An Emacsish system handles these OK, as I think you'll agree.  But I'd
break them down into categories (a1) marking chars [a la mass nouns]
and (a2) marking entire substrings as units [a la count nouns], since
those two subtypes are handled differently when extracting and merging.


No.  The second category of your is a particular case of the category
"c" of mine - the case of "c" where the substring coinsides with the
whole string.


I disagree that your case 'c' is relevant to this example.  What I had
in mind was the <URL>...</URL> tag.  That's not (as you describe for
'c') an expression of a relationship of the substring to the larger
string.


It is.  But apparently my short descriptions was not clear enough.
This metadatum describes a special kind of markup, where the "textual"
part (which we want to be "searchable") is organized in some
"structure" (see below, it was this "structure" thing which I tried to
shortcut to "relation of substring to the whole string").


I see your point now.  I think you were right about (a2) really being
(c) structure.  (But I do wish you'd give longer explanations by default.  :-))

As implemented in eText, the "structure" is a tree with leaves
carrying strings of textual data (and as usual, an arbitrary hash
associated to the whole structure).


I'm not comfortable with changing the basic structure of string data
from a flat sequence of characters into a tree structure.

There's some potential for including support for tree-structured
metadata attached to flat strings.  But I'm only starting to wonder if
it's worthwhile and now it would look.

      b) mark a boundary between chars (e.g. Footnotes)


I'd intended this to be countable metadata (category a2) attached to a
given position but with a length of zero.  But maybe that's not enough.


It is not.  The behaviour wrt text insertion is different.  See below.


Your below text did not help me.  Please elaborate on the insertion
behavior difference.


In fact there is no insertion difference between my NB-mark example
for "c" and "b"-type markup.  There is deletion difference: "mark"
markup (sic) survives deletion of (enclosing) substring.  Embedded
"empty" markup is deleted with the substring it is contained within.


Ah, very good.  That will have to be addressed.  Thanks.

There are some rules of consistency of markup.  One should define
what the any "editing" operation is doing to markup.


I'm not going to even think about designing markup-rule-enforcement
into the metadata infrastructure of Perl's core.

You cannot just mark numerator and denominator by different markups
- any editing operation should keep them adjacent.  This creates the
relationship between them which is refered as "structure" above (and
is a tree in the implementation of eText).


That's fine.  That's the kind of thing we'll have in modules and/or
supported with overloading and tying.  But not in the core.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K