Re: In-Band Information Considered Harmful

Chip Salzenberg writes:

What do you mean by "inband"?  I mean "occupying space in the string".


Inband data has several properties: it occupies space in the string,
and is distinguishable from the string itself by local rules.

Suppose that utf8.pm knows about screen-width of chars (whatever this
means, for me width 0 and 1 is enough).


Have you ever used WordPerfect (a code-based editor) and also
FrameMaker or Microsoft Word (frame-based editors)?


Nope.

You're proposing a code-based (WordPerfect-like) scheme -- where the
metadata are in-band but invisible by default.

I'm proposing a frame-based (Word-like) scheme -- where metadata are
not considered to occupy the same data stream as the content, even
conceptually.

In a code-based scheme, metadata must be handled sequentially because
they _are_ sequential (along with the content).  In a frame-based
scheme, metadata do not need to have a sequence artificially imposed
when it does not belong; but then you need to create a way to access
data in a not-particularly-sequential fashion.


I have a Tk widget with tags, and want to search for bold letter X
which follows non-bold one with Perl regexp.  How do you propose to do
it with non-sequential data?

I favor frame-based editors (and frame-based metadata for Perl too :-)).


Tk widget stores metadata separately.  The question is how to
seamlessly apply Perl text-handling abilities to these data.  My
conviction (after spending *a lot* of time alone with my brain and
this question) is to use inband data, and modify Perl to handle these
data transparently.

I think that the principle of "ignoring 0-width chars" maps well to
this problem domain.

At the Conference, I was pleased to speak at length with Ted Nelson on
many subjects, and made the point to me that one of the Xanadu
system's best features was its total separation of markup
(i.e. formatting and hyperlinks) from content.


Can you provide more context/details?


I can't ever do justice to Ted's ideas.  But his idea was a WWW-like service
where each person can create his own farm of hyperlinks -- content need not
have all of its hyperlinks included at creation time; rather, hyperlinks are
added on by people who discover/decide where it would be a good idea to link
things.  And my set of links may not be the same as your set, since your idea
of relevant connection may differ from mine.


This is how Emacs implements its markup.  It is a binary tree which
contains attributes-boundaries in the order they appear in the
buffer.

However, regular expressions do not map well to this picture.  Emacs
has 3 different notions of search: by REx, by syntax (find matching
paren etc.) and by text attributes.  There is no simple way to combine
them.

Now Perl RExen are very close to allow you combind syntax and REx in
your "searches"/matches.  I want to have all three seamlessly merged.
The paradigm of using RExen for text-processing is too powerful to
be satisfied by half-measures.

In principle one can use (?{}) blocks to check for attributes, but
this is slow/circumvented.  I'm open to better suggestions, but they
should be *substantially* better than what I have now.  In particular,
substr() should be able to quickly return whatever is reasonable.

Ilya