perl-unicode

Re: In-Band Information Considered Harmful

1998-10-25 16:04:42
Chaim Frenkel writes:
IZ> I do not care about implementation.  As far as any editing operartion
IZ> leaves numberator and denominator adjacent to each other.  This is
IZ> automatically so if the *implementation* uses 3 boundaries to get the
IZ> regions.

Err, how? substr($text,0,5) = "" would be pointing off into space.

Why?  lvalue substr() is (mostly) equivalent to a combination of
deletion and insertion.  It may be easily defined what deletion and
insertion does to a given markup.  (Especially if markup has "width", so
you can specify that you want to insert underlined "foo" between
start-bold and start-italic markups of
        "<b><i>bar</i> baz "</b>"
)

If you are still confused, substr($text,0,5) = "" will (depending on
the properties of "fraction" wrt deletion) be either an empty string
without markup, or an empty string marked up as a fraction with empty
numerator and empty denominator.

And with inband data anything at all could happen.

???  Markup is going to be "consisted" after any CORE:: operation.
Here "consistent" is has some weak sense, but my experience with eText
shows that this weak consistence coincides with "real" consistence in
many cases.

Basically, any method would require that the metadata be adjusted
along with the underlying text.

"method"?  We are discussing what the *core* needs to know about
markup.  My claim is one can get "consistence"-preserving behaviour of
CORE:: operations with just a handful of bits of info per markup.

Using an alternative representation where regions would be manipulated
(rather than raw characters) would help avoid the issue, but that
isn't the current schema.

"Current schema"?  What do you mean?  And again, you are discussing
implementation which is not related to semantic at all.

One could remember from which direction the insertion is happening.
After the '3' or before the '4'.

IZ> How would 

IZ>     substr($text, 3, 0) = "5"
IZ> know?

You are correct. The current language doesn't have such a concept. I was
simply throwing out some idea. But your suggestion has the same weakness
on which side of the boundary are you inserting?

With inband markup the string looks like this

      "a <fr(>3<fr|>4<fr)> of"

with <fr*> representing some chars marked as having 0-width.  However,
they have *length* (length of any char is 1, same as with the current
implementation of utf8).  I proposed to have a pragma which regulates
whether the operations are performed in terms of width or length.

With this pragma set to "length" 

     substr($text, 4, 0) = "5"

makes the fraction into 35/4,

     substr($text, 5, 0) = "5"

makes the fraction into 3/54,

I think that perhaps not allowing cross region manipulation (i.e. 
at your own risk) would be the best approach.

Putting your head in the sand is the best approach in many cases.
However, I think we can do better in this particular case.

Ilya