[Top] [All Lists]

Re: risks of markup (bold, etc.)

1993-09-03 09:11:20
Why?  Well, when the Web began it was "hypertext" based.  Everything was
supposed to be simple, plain text accessible by everybody.  Though HTML was
based on SGML, this was more to be "standard" than to support fancy markups.
The core team had a nice plan as to how they would expand, slowly and in step.
Then the NCSA came along, with their elegant multimedia X-based browser.
Suddenly the web became "hypermedia," and (as it supported much more of SGML),
even plain text could be marked up in much fancier ways.  So people started
writing documents depending on the capabilities of NCSA Mosaic, leaving the
earlier browsers behind.  In this case, that lapse changed the entire meaning
of a rather important sentence.

There are a few points here:

Steve and Frederick,

I think one could draw some slightly different lessons from this.  Maybe
the differences might tell us something.

Proposed alternate main principle:  Any time one designs something that
   can be extended, it is necessary to be extremely clear about what an
   "old" processor should do when it encounters an unrecognized extension.  
   If that action is "discard", then one has to be hyper-careful about
   what extensions are permitted.   In some cases, clever design would
   make lexical distinctions between (SGML-speak) "safe to discard tag",
"safe to discard element", and "not save to ignore" extensions that
would permit old processors to behave with relative intelligence.  In
more conventional language, these would be equivalent to "save to
discard verb", "save to discard entire sentence/command/statement",
"better reject program until this can be figured out".

 4) If you're going to use a standard, use *all* of the standard.
    HTML is based on SGML.  The <i> code is legitimate SGML.

SGML is just a language that mostly specifies syntax, not semantics. 
<friddlefarb> is "legitimate SGML" because the syntax rules are met.  So
is this whole message.  But that isn't the point, any more than having a
lexically-correct Fortran program proves that it is useful for a
particular application or even that it will compile.  Or that requiring
that a C program exercise every feature of that language and every
library routine would make sense.

Which codes are really "legitimate" depends on a document type
definition, i.e., what HTML really is defined as being.  "<i>" is either
legitimate there, or it isn't.  If an HTML composing agent permits use
of <i>, and <i> isn't in the DTD, then the composing agent is broken. 
If an HTML reader-browser receives an element and doesn't know what to
do with it (presumably because it was written against an earlier DTD or
DTD-concept), it has bad data on its hands.  Programs that ignore
obvious bad data on the theory that anything unrecognized is noise are
likely to get into all sorts of trouble--that is nothing new.

Now, in the HTML / WWW case in particular, I'd think that even a weak
browsing engine has to have hypertext pop-up or exploder capability. 
Elementary robust programming strategies would argue that encourntering
an unknown element should cause the user to see an "unrecognized <i>
element, ..., what do you want to do about it" message to appear.  These
things are language processors, and we all know those things happen in
language processors.


<Prev in Thread] Current Thread [Next in Thread>