perl-unicode

Re: In-Band Information Considered Harmful

1998-10-26 00:50:55
You, Dan Sugalski, wrote:
++ 
++ 
++ Well, I was thinking of something along these lines.
++ 
++ Assume four separate, completely unrelated streams of meta-data:
++ 
++ * Language (French, English, Latin, Perl)
++ * Display properties (color, font, size, orientation)
++ * HTML Markup
++ * Glossary links
++ 
++ If our source string is:
++ 
++ A simple perl statement looks something like print $foo, "\n". Easy, isn't
++ it?
++ 
++ Fully marked up in-band it looks like:
++ 
++ <LANGUAGE=English><P><COLOR=black><BGCOLOR=white><FONT=Albertus><FONTSIZE=12>
++ <FONTUNIT=point>A simple <A
++ HREF="http://www.perl.com";><GLOSSARY=perl><I>perl</I></A></GLOSSARY> 
++ <GLOSSARY=statement>statement</GLOSSARY> looks something like
++ </FONT></LANGUAGE><LANGUAGE=perl><FONT=Courier><CODE><GLOSSARY=Print
++ SUBGLOSSARY=perl>print
++ <FONTSTYLE=italic><GLOSSARY=Scalar variable
++ SUBGLOSSARY=perl>$foo</FONTSTYLE></GLOSSARY>,
++ 
<FONTMOD=notypographerquote>"</FONTMOD>\n<FONTMOD=notypographerquote>"</FONTMOD>
++ </FONT></LANGUAGE><LANGUAGE=English><FONT=Albertus>.
++ Easy, isn't it?
++ 
++ Now, extract the text and HTML markup. A task made more difficult by the
++ fact that the <FONT> tags are actually display property metadata, not HTML
++ metadata.

This is even further complicated in that XML/SGML (and HTML moves towards
that as well) are both inband *and* out-of-band markup. Inband for
"structural" markup, out-of-band for rendering purposes (style sheets).

Properly written SGML/XML documents will not have markup for "bold" or
"italics" - after all, how do bold and italics sound? How useful is it
to look for italics when you're using a VT100? It might come with a
style sheet that suggests certain renderings for certain devices, but
those can be ignored or overruled.

Your example might look like:

<P><LANGUAGE=English>A simple <A HREF="http://www.perl.com";>
<GLOSSARY class = important>perl</GLOSSARY></A> <GLOSSARY>statement</GLOSSARY>
looks something like <LANGUAGE=perl><CODE class = perl>
<GLOSSARY=Print SUBGLOSSARY=perl>print
<GLOSSARY class = variable SUBGLOSSARY=perl>$foo</GLOSSARY>,
&#12345;\n&#12345;</CODE></LANGUAGE>. Easy isn't?</LANGUAGE></P>

With metadata:
p.foreground: black
p.background: white
p.font:       albertus
p.font-size:  12pt
glossary.{important,variable}.font-style: italics
code.perl.font: courier


Abigail