perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 15:38:01
According to Felix S. Gallo:
Chip writes:
Felix writes:
 ($url, plaintext) =~ /(\m{anchor})(text)/;

Could you unpack that for me?  I don't get your meaning.

A 'captured' part of a regular expression that contains metadata passes
the metadata as-is, so $url contains a reference to the HREF.

OK, I get it.  But I was trying for something more ambitious.

1. Imagine there's a new "meta(VALUE, NAME)" operator that extracts from
   VALUE the first metadata with the given NAME.  (Additional optional
   parameters could specify position and length.  But we'll get back to
   that later.)

2. Imagine that a user has encoded HTML as metadata named for the HTML
   tags, and the tag attributes are stored as Perl hashes attached to
   those metadata.

3. Imagine that the return value of (?{}) can specify success or
   failure.

4. Imagine finally that $1 etc. are available inside (?{}).

(All of these are reasonable things to imagine, IMO.)

Then this code would retrieves all of the anchors that point to
perl.org:

     @a = ($page =~ /(\m{a}.+)(?{ meta($1,'a')->{href} =~ /perl\.org/i })/);

Note that this code examines the metadata _during_ the search process.

The only representations I feel comfortable about manipulating are
in-memory multi-dimensional representations (a la Emacs buffers) where
all changes can be propagated immediately, and flattened representations
(a la XML) in which the metadata go wherever the data go and there is
no chance of their getting out of sync.

Aha!  I understand your idea much more fully now.  My entire bit about
regexps was based on the idea that meta-data and text would be
separate, possibly so separate that they might be on different machines,
or kept by different entities.  Once you assume a tight coupling ("...can
be propagated immediately..." and "...no chance of their getting out of
sync...") you auto-solve a lot of problems I was trying to fix.  You also
give up the idea of multiple instances of meta-data for one plaintext,
though.

[...] It's only when you swing for the "separateness" fence that you
might try to optimize the "synchronize after unsynchronized arbitrary
edits" case.

I'm glad to have cleared up the misunderstanding.  But don't let my
withdrawal from a strong multi-meta system discourage you from working
such a thing out for yourself.

I'm trying to figure out what's best for Perl the language.  A
separate question is what people build _using_ Perl.  And I would be
very glad to see a system like you have been describing as a module,
maybe even a standard module.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K