perl-unicode

In-Band Information Considered Harmful

1998-10-22 20:20:43
[ followups redirected to perl6-porters ]

According to Ilya Zakharevich:
[a proposal for] inband data (since one bit of these infinite
amount may denote that the char should be interpreted not as a char,
but as an address/id of some external data, say color/font for text
processing application)

Might I suggest deprecating in-band information and switching to a
model that keeps content and metadata separate?

By this, I do not mean to imply any breakage of existing code.
However, when looking at the future, let us learn the lessons that
HMTL and XML are teaching us -- by their _bad_ examples.

Consider: Why should it be that "<b>Hello</b> there!" no longer
matches the pattern /hello there/i ?  Wouldn't it be nice to keep the
metadata off to the side?  Then you have a much easier time of pattern
matching, and as a bonus you're no longer limited to one set of
metadata.

At the Conference, I was pleased to speak at length with Ted Nelson on
many subjects, and made the point to me that one of the Xanadu
system's best features was its total separation of markup
(i.e. formatting and hyperlinks) from content.  It would have allowed
(e.g.) me to use one set of markup and (e.g.)  you to use another set,
all without duplicating or mangling the original content.  None of
this is at all feasible with HTML/XML, which would require either
duplicating content or commingling various independent sets of markup
data.

Applying this separation of data from metadata to the internals of
Perl seems like a promising way to support arbitrary markup without
compromising Perl's text-slinging heritage.
-- 
Chip Salzenberg               - a.k.a. -              
<chip(_at_)perlsupport(_dot_)com>
 "... under cover of afternoon in the biggest car in the county?!" //MST3K