perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 14:36:24
According to Felix S. Gallo:
I think instead we'd need new metadata escapes in the RE language.
Let's call them \m{X} to require metadata tag X, and \M{X} to forbid
tag X.  e.g.:
 /\m{italic}\m{bold}Yes!/

You'd also need to specify which meta-data layer you're combining
with the plaintext layer ...

I wonder if that belongs on the left side of =~ somehow.  After all, a
given combination of data and metadata may be the target of a search,
but it also may be flattened or modified or ...

The only thing I don't see as obvious in this scheme is how to access
the additional information associated with a tag when matching.
/\m{a}text/ for anchored /text/ is fine, but once you've found it, how
do you access the anchor HREF -- perhaps because you're only looking
for HREFs to perl.org?  It's possible that we won't be able to express
all that in the RE engine per se, and that we'll have to escape via
(?{}) and use the Perl language primitives.

how about
 ($url, plaintext) =~ /(\m{anchor})(text)/;

Could you unpack that for me?  I don't get your meaning.

2.  "How do you know a meta-data layer is appropriate to, and
synchronized with, a given piece of content?"

How about if meta-data layers are not themselves unadorned 
monotonic bytestreams, but instead something like regular
expressions?

I appreciate your intent here, but I have a hard time imagining an
implementation that's robust enough to be trustworthy for
mission-critical purposes.

I lost the plot as soon as you said 'robust', and was in the ditch
at the time 'mission-critical' arrived on the scene.  Try it again?
In case the confusion is due to me, let me explain: regexp-style
meta-data layers use regexp syntax to avoid the anchoring
problem and the irrelevance problem.  They're as robust as
regular expressions currently are.

Yes, exactly.  A regex that fails in the middle leaves you without any
recourse for the remainder of the text.  So changing something in the
middle of the string may entirely destroy the ability to reattach
metadata from that point forward.  I don't consider that kind of
fragility acceptable.

Furthermore, to whatever extent you embed the content _into_ the
metadata, you have recreated an embedded-code representation but
decided to call it 'metadata'.  Uh uh.

The only representations I feel comfortable about manipulating are
in-memory multi-dimensional representations (a la Emacs buffers) where
all changes can be propagated immediately, and flattened representations
(a la XML) in which the metadata go wherever the data go and there is
no chance of their getting out of sync.

 Xanadu had the luxury of being a
long-running server; Perl doesn't.

If you mean that you think this system has inefficiencies

No, I am concerned with the fragility of the regex approach.  Xanadu
could basically make up a multi-dimensional representation and
maintain it in perpetuity.  Perl, being a language used for transient
glue programs, does not have this luxury.  So the full glory of
separate text and metadata may be unachievable for Perl the language,
since we do not have the ability to ensure that text and metadata
remain in sync forever unless we write them out together (flat).

On the same topic, could you add to the 'wanted' list hooks for
open(SPAM, "<http://whatever.spam.org";)
and open(SPAM, ">http://whatever.spam.org";);
?

I think it's about time for those.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K