perl-unicode

Re: In-Band Information Considered Harmful

1998-10-26 21:19:44
Chip Salzenberg writes:
According to Ken Fox:
Overlapping metadata is a problem though.

Not in this area, AFAIK.

How about this text:

  hello<foot><foot>fu</foot>bar</foot>

which (could) become:

  #     0123456789
  $s = "hellofubar";

  meta($s, 5, 2) = 'foot';
  meta($s, 7, 3) = 'foot';
  meta($s, 5, 0) = 'word break'; # otherwise \b fails!
  meta($s, 7, 0) = 'word break';

Calling meta($s, 'foot') will return foot ABC.  How should foot DEF
be fetched?  It applies to the same offset.  I think meta() should
always return a metadata stream (collection maybe?).  (I'm assuming
that Topaz will have more built-in data types than Perl 5.)  The
stream should include all matching metadata objects contained in the
range of the first metadata object found.

I favor streams over lists because we can mess around with the
semantics of streams whereas lists have to be compatible with Perl 5.

BTW, creating nested metadata isn't a problem with Chip's proposal.
The example above could also become:

  #     012345
  $s = "hello";
  $t = "bar";

  meta($t, 0, 0) = { name => 'foot', data => "fu" };
  meta($s, 5, 0) = { name => 'foot', data => $t };

To fetch the nested foot, just say:

  meta(meta($s, 'foot')->{data}, 'foot');

- Ken

-- 
Ken Fox, kfox(_at_)ford(_dot_)com, (313)59-44794
------------------------------------------------------------------------
Ford Motor Company, Powertrain           | "Is this some sort of trick
Analytical Powertrain Methods Department |  question or what?" -- Calvin
C3P Implementation Section               |