perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 15:50:20
According to Ilya Zakharevich:
Chip Salzenberg writes:
In contrast, working frame-based, I only need walk the attribute tree,
find the attributes that apply to the given characters, and copy them.
That's O(log N) or so -- certainly better than O(N).  More
significantly, it requires *no* knowledge of metadata semantics.

Same for inline data.  There is absolutely no difference between
semantic of having metadata inline or separate.

I disagree.  Before you try to make this assertion again, please
explain how Perl would properly handle the 'll' case with code-based
metadata.  Be sure to allow for the various kinds of metadata nesting
behavior: <b> doesn't nest, <li> nests, and <p> marks a spot instead
of a region.  And Perl's RE and other character-processing engines
need to know this to handle them properly in the 'll' case.

Who cares how is it implemented?  We discuss *semantic* here.

You've missed the point.  Big time.

Here is the point, in large friendly letters for clarity:

 * The Perl Runtime Should Not Have To Know  *
 *   About Nesting Behaviors of Metadata     *

Since <ul> (not <li>, thinko above, sorry) nests, whereas <b> doesn't,
and <p> marks a spot so nesting is not a relevant question.
Therefore, proper extraction of '<b>ll</b>' from '<b>hello</b>' would
require Perl to (1) be aware of the nesting behaviors of embedded
codes, and (2) keep track of the current state of all those attributes
at any point in a string.

I find both of these implications unacceptable.

(1) Knowledge of specific metadata encoding schemes does not belong in
the Perl source code.  Period.

(2) Imposing metadata state tracking on every non-trivial regex is
also unacceptable.  ("Non-trivial" in this context means "using memory
parens or metadata queries".  Note also that using "$&" even once
imposes memory-paren-like behavior on _all_ regexes.)

btree approach is acceptable in the sense that *any* operation gets a
small-multiplier (say, x20) slow-down only.

You cannot be taken seriously if you say "times 20".  That's impossible.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K