perl-unicode

Re: In-Band Information Considered Harmful

1998-10-26 20:38:36
According to Ken Fox:
Chip Salzenberg writes:
1. Imagine there's a new "meta(VALUE, NAME)" operator that extracts from
   VALUE the first metadata with the given NAME.  (Additional optional
   parameters could specify position and length.  But we'll get back to
   that later.)

Did you ever get back to the 4 argument meta()?

I've come at the problem from a different direction in my "Interface"
message.

@a = ($page =~ /(\m{a}.+)(?{ meta($1,'a')->{href} =~ /perl\.org/i })/);

This can be written without the neat (?{}) feature:

  @a = grep { meta($_, 'a')->{href} =~ /perl\.org/i }
            ($page =~ /(?:\m{a}.)+/g);

That's cool.

I didn't see the definition of \m{} either, so here's the one I assume:
  \m{ATTR} -- zero width match that requires the current position to
              have the metadata attribute ATTR defined

I'm glad you asked.  No, I'd define it this way:

   \m{ATTR} -- states that subsequent characters can match iff they
               have metadata that matches ATTR
   \M{ATTR} -- s/have/do not have/

  meta(STR, ATTR, OFFSET, LEN)
  meta(STR, ATTR, OFFSET)
  meta(STR, ATTR)

    Find the first metadata object containing ATTR applying to STR
    between OFFSET and OFFSET+LEN, or if LEN is omitted between
    OFFSET and length(STR), or if OFFSET is omitted between 0 and
    length(STR).

    ATTR is a set expression or predicate function.

Hm, a la grep and map?  Hm.

This forms an iteration technique too:

  $offset = 0;
  while ($a = meta($str, 'a', $offset)) {
    $offset = $a->end;
  }

Yes, we'll need iteration.  But I wonder if we can get away with using a
single operator for iteration and non-iteration.

Overlapping metadata is a problem though.

Not in this area, AFAIK.

The implementation of meta() in the perl core could be really easy -- just
delegate to metadata magic.

Yes, I'm starting to see just how abstract the core support must be
(if it's necessary at all).

This would let metadata implementations using zero-width characters
co-exist with out-of-band metadata.  The same scalar could even be
blessed with metadata several times.

Yes -- this leads to the idea of metadata that lives separately and
that is tied to the variable.  But I haven't thought about how that
might work, yet.

BTW, I was wondering about the semantics of '\m{a}.+' in Chip's
example:

@a = ($page =~ /(\m{a}.+)(?{ meta($1,'a')->{href} =~ /perl\.org/i })/);

What happens with:

  <a ...>foo</a><a ...>bar</a>

Urk!  Good catch, Ken.  This is stickier than I had suspected.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K