perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 14:37:33
John Macdonald writes:
He showed a list, the same common sentence tagged with a variety of
different embellishments.  He considered whether they should be
considered equivalent to the underlying sentence.  Depending upon the
particular tags, some did and some did not - some tags affect the
meaning as well as the presentation.  (If I recall correctly, one
example would be that "<SARCASM>something</SARCASM>" should be
considered as not equal to "something", while
"<EMPHASIS>something</EMPHASIS>" would be equal.  His examples were a
lot better though.)

It is not clear that there *can be* a binary separation between
content and markup.  When a tag partially fit into both camps, an
in-line scheme works better because it is the reader that gets to
decide whether a particular kind of item is content or markup.  The
downside is that there can be many such decisions, so you really want
an underlying default choice of content/markup that is usually right
so that it only need be overridden for unusual cases.  Without good
defaults, reading is a lot harder.

See my reply to Chaim's (?) mail.  The default with ignore-markup
should be to ignore all 0-width data (markup).  If you want to require
some markup, you just insert the corresponding "char" into a REx.  If
you want to require an absense of some markup, you insert negative
lookahead.

This is the advantage of inband meta-data: no new syntax is required.

Ilya