perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 16:01:24
Chip Salzenberg writes:
You've missed the point.  Big time.

Here is the point, in large friendly letters for clarity:

 * The Perl Runtime Should Not Have To Know  *
 *   About Nesting Behaviors of Metadata     *

Why?  There are at least 3 types of metadata: 

      a) mark char or a sequence of chars;
      b) mark a boundary between chars;
      c) mark a substring of text as having a special relationship to
         a bigger substring of text.

Examples:

      a) Bold attribute;
      b) Insert point position, 
         footnote position;
      c) Tables, 
         In "has 34 of power" '3' is a numerator and '4' is
         denominator of a fraction (which may be showsn as 3/4 on a
         dumb terminal).

You think Perl has a way to handle them in uniform way.  I do not see
such a way.

Moreover, since there is no difference in semantic of inband data and
out-of-band data, I do not see how it is relevant.

I find both of these implications unacceptable.

(1) Knowledge of specific metadata encoding schemes does not belong in
the Perl source code.  Period.

See above.

(2) Imposing metadata state tracking on every non-trivial regex is
also unacceptable.  ("Non-trivial" in this context means "using memory
parens or metadata queries".  Note also that using "$&" even once
imposes memory-paren-like behavior on _all_ regexes.)

I do not think your position on this question is supported by a lot of
deliberation.  Moreover, I do not see why a discussion of
implementation is timely now.

btree approach is acceptable in the sense that *any* operation gets a
small-multiplier (say, x20) slow-down only.

You cannot be taken seriously if you say "times 20".  That's impossible.

What is 5*log(80) in your opinion (assuming operations over *short*
strings)?  Logarithmic multipliers look small, but usually are not.

Ilya