Re: In-Band Information Considered Harmful

According to Felix S. Gallo:

My guess is that meta-data layering is worth it, if only to avoid the
horrific HTML death spiral.


You've perfectly expressed my feelings about HTML.  Thank you.

1.  "What about searching for a combination of meta-data and data?"

If you buy into the information-as-unadorned-chars and meta-data-
as-optional-and-nonintegral-layers-of-markup philosophy, then you
find yourself rarely attempting to use meta-data as part of a search.
But it would be possible to implement another regexp flag that
flattens the data and the meta-data layers into a single bytestream.


We will undoubtedly need excellent support for flattening and layering
on demand.  But I doubt that flattening is the best approach to
regexes.  (Is it "<bold><italic>Yes!</italic></bold>", or is it
"<italic><bold>Yes!</bold></italic>" ?  Yuck.)

I think instead we'd need new metadata escapes in the RE language.
Let's call them \m{X} to require metadata tag X, and \M{X} to forbid
tag X.  e.g.:

    /\m{italic}\m{bold}Yes!/

Note that those codes impose conditions on the following text, they do not
represent embedded codes (a la Ilya or WordPerfect).  Thus any string that
would match the previous example would also match:

    /\m{italic}s/

(partial metadata specification is OK), but would NOT match:

    /\M{bold}s/

(I suppose we could get really brave and allow the {} to be a regex.  Ouch!)

The only thing I don't see as obvious in this scheme is how to access
the additional information associated with a tag when matching.
/\m{a}text/ for anchored /text/ is fine, but once you've found it, how
do you access the anchor HREF -- perhaps because you're only looking
for HREFs to perl.org?  It's possible that we won't be able to express
all that in the RE engine per se, and that we'll have to escape via
(?{}) and use the Perl language primitives.

2.  "How do you know a meta-data layer is appropriate to, and
synchronized with, a given piece of content?"

How about if meta-data layers are not themselves unadorned 
monotonic bytestreams, but instead something like regular
expressions?


I appreciate your intent here, but I have a hard time imagining an
implementation that's robust enough to be trustworthy for
mission-critical purposes.  Xanadu had the luxury of being a
long-running server; Perl doesn't.  I think that the best we can do
is provide convenient primitives for adding, removing, and examining
metadata to in-memory data, and convenient flattening and layering
facilities (with the proof-of-concept being XML, natch).

layer: /^(This).*(test).*(broad.*t).$/(1=bold, 2=italic, 3=link:spam.html)


It wouldn't be hard to build this kind of facility on top of the
primitives I think of providing.  You could easily make it available
as a module, for example, with no significant efficiency loss.

3.  "How do you make this as invisible as possible?"

In one sense this would make life a lot easier for Perl users,
because there's currently no notion of meta-data at all in Perl,
so people have to roll it themselves in regexps, usually with
HTML.  Making it so that regexps never or rarely had meta-data
would make writing regexps a lot easier in the majority case.


Yes!  Excellent expression of my opinion #2!

The problem would be importing and exporting layered text,


There would have to be excellent facilities for people writing
their own layered text filters.

So it would be great if the perl builtins (<FILE>, print)
intuitively understood about meta-data and organized it
themselves.


If tied filehandles get more efficient -- and they'd better! --
then it'll be possible to do all that you suggest without making
changes to the behavior of the built-in operators.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K