perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 17:30:08
According to Ilya Zakharevich:
Chip Salzenberg writes:
 * The Perl Runtime Should Not Have To Know  *
 *   About Nesting Behaviors of Metadata     *

Why?

Because Perl should not have to know the details and quirks of HTML,
XML, and all other encoding schemes.  It's a layering issue.

That aside, your taxonomy of metadata is interesting:

 There are at least 3 types of metadata: 
      a) mark char or a sequence of chars (e.g. Bold)

An Emacsish system handles these OK, as I think you'll agree.  But I'd
break them down into categories (a1) marking chars [a la mass nouns]
and (a2) marking entire substrings as units [a la count nouns], since
those two subtypes are handled differently when extracting and merging.

      b) mark a boundary between chars (e.g. Footnotes)

I'd intended this to be countable metadata (category a2) attached to a
given position but with a length of zero.  But maybe that's not
enough.  In any case, it may be possible to do without this type
entirely (HTML does).

      c) mark a substring of text as having a special relationship to
         a bigger substring of text (e.g. Tables)

Countable metadata (category a2) can't merge, so multiple countable
metas that cover overlapping areas are an easy representation of
nested tables:

     +--------- outer table ----------+
     |                                |
     |        + inner table +         |
     v        v             v         v


(2) Imposing metadata state tracking on every non-trivial regex is
also unacceptable.  ("Non-trivial" in this context means "using memory
parens or metadata queries".  Note also that using "$&" even once
imposes memory-paren-like behavior on _all_ regexes.)

I do not think your position on this question is supported by a lot of
deliberation.  Moreover, I do not see why a discussion of implementation
is timely now.

My design priorities do not permit ignoring these issues, even at this
early stage.

You cannot be taken seriously if you say "times 20".  That's impossible.
What is 5*log(80) in your opinion (assuming operations over *short*
strings)?  Logarithmic multipliers look small, but usually are not.

Your calculation is accurate but irrelevant.  Those constants make no
sense for the issues under discussion.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K