perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 14:40:25
[Sorry to extend these branches, but a lot of people address different
       sides of my proposal.]

Felix S. Gallo writes:
Tk widget stores metadata separately.  The question is how to
seamlessly apply Perl text-handling abilities to these data.  My
conviction (after spending *a lot* of time alone with my brain and
this question) is to use inband data, and modify Perl to handle these
data transparently.

This is a lot easier to implement, and it saves you the trouble
of having to keep two disparate things around.  However, you
tightly bind one version of the meta-data to the text, and give up
the very nifty multiple-meta-datas-for-one-text idea. 

This idea is inspiring, but it is vialble only with metadata for
stable=dead data.  If the metadata is embedded, then editing
operations over the data automatically update metadata.  This
prohibits multiple-meta-datas-for-one-text, but it will not be working
with life text anyway.

Further,
you lock yourself in to one particular meta-data representation.
That might itself be a good thing, but you'd have to be sure you
designed it to work sufficiently well with all of the file formats
that will ever exist -- tricky, but possible.  The separated-meta-data
approach permits you to come up with entirely novel and bizarre
meta-data formats as they emerge.

You want to modify REx engine for each new format M$/Sun/IBM/Joe
invents?  I propose one format for metadata: unlimited-width
numbers=sequences-of-bits in a (modified) utf8 encoding.

I'm sure regexps can be made to handle in-band and out-of-band
meta-data fairly seamlessly; the failure of emacs to do it does not
mean it can't be done.  I think everyone agrees that the merge needs
to be seamless, if Topaz is to have an opinion about meta-data at all.

Yes.  One of two advantages of inline metadata is that there should be
no change to REx syntax at all: REx can contain an embedded literal
sequence of bytes, and metadata is just such a sequence.  (The other
advantage being auto-update when a substring of data is modified.)

Oh, BTW, just in case you're not confused enough, layers also have
the valuable property of being able to be used in serial, as well as
parallel (multiple layers on one text in addition to multiple layers 
possible on one text), and further have the valuable property of being
able to selectively remove data from the text (which is more difficult
in in-band markup).  

What is bad with

  s/\X{width=0,type=font}//g;

?

Ilya