John Macdonald writes:
He showed a list, the same common sentence tagged with a variety of
different embellishments. He considered whether they should be
considered equivalent to the underlying sentence. Depending upon the
particular tags, some did and some did not - some tags affect the
meaning as well as the presentation. (If I recall correctly, one
example would be that "<SARCASM>something</SARCASM>" should be
considered as not equal to "something", while
"<EMPHASIS>something</EMPHASIS>" would be equal. His examples were a
lot better though.)
It is not clear that there *can be* a binary separation between
content and markup. When a tag partially fit into both camps, an
in-line scheme works better because it is the reader that gets to
decide whether a particular kind of item is content or markup. The
downside is that there can be many such decisions, so you really want
an underlying default choice of content/markup that is usually right
so that it only need be overridden for unusual cases. Without good
defaults, reading is a lot harder.
See my reply to Chaim's (?) mail. The default with ignore-markup
should be to ignore all 0-width data (markup). If you want to require
some markup, you just insert the corresponding "char" into a REx. If
you want to require an absense of some markup, you insert negative
lookahead.
This is the advantage of inband meta-data: no new syntax is required.
Ilya