perl-unicode

Re: In-Band Information Considered Harmful

1998-10-24 11:57:53
Chip Salzenberg writes:
As implemented in eText, the "structure" is a tree with leaves
carrying strings of textual data (and as usual, an arbitrary hash
associated to the whole structure).

I'm not comfortable with changing the basic structure of string data
from a flat sequence of characters into a tree structure.

No, there is no need to.  The tree structure sits completely in the
markup, the string part is just 

        a) catenation of the leaves - with external markup;
        b) leaves joined by markup "chars" - with inband markup;

There's some potential for including support for tree-structured
metadata attached to flat strings.  But I'm only starting to wonder if
it's worthwhile and now it would look.

It is ;-), and the implementation would look pretty similar to other
types of markup.

There are some rules of consistency of markup.  One should define
what the any "editing" operation is doing to markup.

I'm not going to even think about designing markup-rule-enforcement
into the metadata infrastructure of Perl's core.

You see, you thought about dead data only indeed!  In fact there is no
big deal to define what happens with each markup type during editing
(at least with inband implemenation, this is the most important
advantage of having things inband).  

It is only 2 (maybe 3 or 4 when you take into account the difference
between "a", "b", "c") bits of info per markup unit:

   1) where it "sticks" (right or left) when substring is inserted at
      the same position 
          (needed with out-of-band implementation only, or if the
          insert position is specified not in terms of "length", but
          in terms of "width" with inband 0-width markup implementation);

   2) behaviour wrt deletion;

   3) Maybe: behaviour wrt becoming empty;

   4) Maybe: behaviour when two are adjacent.

I did not implement "3" with eText widget's blocks, thus there may be
some details I'm missing.  It may happen that these 4 bits describe
completely the difference between all the types "a", "b", "c", so only
a type "c" with 4 configuration bits is actually needed.

You cannot just mark numerator and denominator by different markups
- any editing operation should keep them adjacent.  This creates the
relationship between them which is refered as "structure" above (and
is a tree in the implementation of eText).

That's fine.  That's the kind of thing we'll have in modules and/or
supported with overloading and tying.  But not in the core.

If it were different to implement, then yes.  But the consistency is
preserved (almost) automatically with inband markup, and it is still
not hard to do with out-of-band markup.

Since it is *very* hard to do in a module, it should be in the core.

Ilya