perl-unicode

Re: In-Band Information Considered Harmful

1998-10-26 02:37:19
Chip Salzenberg <chip(_at_)perlsupport(_dot_)com> writes:
According to Nick Ing-Simmons:
Well "find the next word which is not in same font as I am currently using"
is something my PostScript renderer does do.

Wouldn't it rather iterate through blocks of text with identical metadata?
Isn't that the primitive you'd prefer to build on?

Probably, but not necessarily. For example:

<h2>Title</h2>
<b>Text Here</b>

If <h2> implies bold all that might be in same font, but two fragments
would have different meta-data.

But in such cases rendering as two fragments is probably fine.


Which is markup - to divide input into tokens.

I disagree.  Anything that is done for strictly semantic reasons can't
sanely be called "markup".  But perhaps we've gone too far into
angel-counting.

Very likely. 

But a lot of markup _is_ semantic. e.g. <h2> vs <b> above,
h2-ness does not directly affect the rendering (say bold), it is semantic
(this is a level 2 heading).

The root of my point is that there is information which is needed in both 
data and meta-data views. This is implicit in in-band meta-data (tags). If 
we go for out-of-band meta-data, then this information may have to 
be (redundantly) represented in both spaces, and some mechanism is needed
to tie two together in face of modification.

e.g.

$s = "perl <emph>is</emph> great."

$s =~ s/is/will be/;

Wants to become (one assumes)

"perl <emph>will be</emph> great."

Not 

"perl <emph></emph>will be great."

or worse

"perl <emph>wi</emph>ll be great."


These latter two are all too easy to implement by accident if data/meta-data
hooks are not strong enough.


-- 
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.