perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 19:18:12
Chip Salzenberg writes:
 * The Perl Runtime Should Not Have To Know  *
 *   About Nesting Behaviors of Metadata     *

Why?

Because Perl should not have to know the details and quirks of HTML,
XML, and all other encoding schemes.  It's a layering issue.

This is why we need an abstraction to which all the king's men map -
with as small loss of info as it is possible.

      a) mark char or a sequence of chars (e.g. Bold)

An Emacsish system handles these OK, as I think you'll agree.  But I'd
break them down into categories (a1) marking chars [a la mass nouns]
and (a2) marking entire substrings as units [a la count nouns], since
those two subtypes are handled differently when extracting and merging.

No.  The second category of your is a particular case of the category
"c" of mine - the case of "c" where the substring coinsides with the
whole string.

      b) mark a boundary between chars (e.g. Footnotes)

I'd intended this to be countable metadata (category a2) attached to a
given position but with a length of zero.  But maybe that's not
enough. 

It is not.  The behaviour wrt text insertion is different.  See below.

In any case, it may be possible to do without this type entirely
(HTML does).

Emacs needs this (markers).  Tk's Text needs this.  I would suppose
The reason HTML does not need it is because HTML works with dead data,
where this is synonimous to an offset.

      c) mark a substring of text as having a special relationship to
       a bigger substring of text (e.g. Tables)

Countable metadata (category a2) can't merge, so multiple countable
metas that cover overlapping areas are an easy representation of
nested tables:

     +--------- outer table ----------+
     |                                |
     |        + inner table +         |
     v        v             v         v

Nope.  You cannot insert anything *between* numerator and
denominator.  You can either insert something in numerator, or
denominator.  

Same with cells of a table.  You can assign an attribute "cell_5_4" to
a substring, and an attribute "cell_5_5" to an adjacent substring, but
this will break if you insert anything at the boundary of the cells.

I would guess that your oversimplified view on markup is related to
working with dead data.  I think that all you were saying may be
*literally* true if you are not going to modify data in any way
whatsoever. 

My deliberations on the subject were relating to text procession, when
you "import" data to modify it.

Ilya