perl-unicode

Re: In-Band Information Considered Harmful

1998-10-25 09:18:22
On Fri, 23 Oct 1998 chip(_at_)perlsupport(_dot_)com wrote:
According to Ilya Zakharevich:
I disagree.  Before you try to make this assertion again, please
explain how Perl would properly handle the 'll' case with code-based
metadata.  Be sure to allow for the various kinds of metadata nesting
behavior: <b> doesn't nest, <li> nests, and <p> marks a spot instead
of a region.  And Perl's RE and other character-processing engines
need to know this to handle them properly in the 'll' case.

Who cares how is it implemented?  We discuss *semantic* here.

You've missed the point.  Big time.

Here is the point, in large friendly letters for clarity:

 * The Perl Runtime Should Not Have To Know  *
 *   About Nesting Behaviors of Metadata     *

Sure, that's easy to agree with.  But I don't understand why we can't have
it both ways.  It would be a mistake to sacrifice pure text-stream
performance for any reason.  For some application, regexp can never be
fast enough.  OTOH, I can certainly see the potential need for:

(1) Knowledge of specific metadata encoding schemes does not belong in
the Perl source code.  Period.

(2) Imposing metadata state tracking on every non-trivial regex is
also unacceptable.  ("Non-trivial" in this context means "using memory
parens or metadata queries".  Note also that using "$&" even once
imposes memory-paren-like behavior on _all_ regexes.)

But isn't this the whole point of moving the C++?  We will more easily be
able to do multiple implementations for similar interfaces?  Why shouldn't
a string representation be abstract?  Designing this abstraction is what
seems interesting.  From that perspective, the whole discussion looks a
lot more relavent.

OTOH, I've seen lots of problems happen by investing in poor
abstractions.  Certainly, it is worth making a best effort (perl5 is
extraordinary) but sometimes you just can't predict the future well
enough.  Maybe the real question is: how can we specify abstractions that
can evolve with the least amount of breakage?  Perl5 is excellent at
evolving:

  compile-time bits
  compile-time inheritance
  compile-time v-tables
  look-a-side magic
  run-time inheritance
  etc...

How can we minimize the perceived difference between these approaches?
What other approaches are available?  Can we quantitatively optimize data
differentiation/representation across potential implementations?

btree approach is acceptable in the sense that *any* operation gets a
small-multiplier (say, x20) slow-down only.

You cannot be taken seriously if you say "times 20".  That's impossible.

Any performance hit is possible when you consider adding an arbitrarily
complex feature.  This is not necessarily bad, but optional it must be!