perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 08:02:00
Ilya Zakharevich wrote :
|| Chip Salzenberg writes:
|| > > > At the Conference, I was pleased to speak at length with Ted Nelson on
|| > > > many subjects, and made the point to me that one of the Xanadu
|| > > > system's best features was its total separation of markup
|| > > > (i.e. formatting and hyperlinks) from content.
|| > > 
|| > > Can you provide more context/details?
|| > 
|| > I can't ever do justice to Ted's ideas.  But his idea was a WWW-like 
service
|| > where each person can create his own farm of hyperlinks -- content need not
|| > have all of its hyperlinks included at creation time; rather, hyperlinks 
are
|| > added on by people who discover/decide where it would be a good idea to 
link
|| > things.  And my set of links may not be the same as your set, since your 
idea
|| > of relevant connection may differ from mine.
|| 
|| This is how Emacs implements its markup.  It is a binary tree which
|| contains attributes-boundaries in the order they appear in the
|| buffer.
|| 
|| However, regular expressions do not map well to this picture.  Emacs
|| has 3 different notions of search: by REx, by syntax (find matching
|| paren etc.) and by text attributes.  There is no simple way to combine
|| them.

A separate related item also presented at the conference was in Tim
Bray's talk on XML.

He showed a list, the same common sentence tagged with a variety of
different embellishments.  He considered whether they should be
considered equivalent to the underlying sentence.  Depending upon the
particular tags, some did and some did not - some tags affect the
meaning as well as the presentation.  (If I recall correctly, one
example would be that "<SARCASM>something</SARCASM>" should be
considered as not equal to "something", while
"<EMPHASIS>something</EMPHASIS>" would be equal.  His examples were a
lot better though.)

It is not clear that there *can be* a binary separation between
content and markup.  When a tag partially fit into both camps, an
in-line scheme works better because it is the reader that gets to
decide whether a particular kind of item is content or markup.  The
downside is that there can be many such decisions, so you really want
an underlying default choice of content/markup that is usually right
so that it only need be overridden for unusual cases.  Without good
defaults, reading is a lot harder.

-- 
objects:                                    | John Macdonald
    Think of them as data with an attitude. |   jmm(_at_)elegant(_dot_)com