You, Dan Sugalski, wrote:
++
++
++ Well, I was thinking of something along these lines.
++
++ Assume four separate, completely unrelated streams of meta-data:
++
++ * Language (French, English, Latin, Perl)
++ * Display properties (color, font, size, orientation)
++ * HTML Markup
++ * Glossary links
++
++ If our source string is:
++
++ A simple perl statement looks something like print $foo, "\n". Easy, isn't
++ it?
++
++ Fully marked up in-band it looks like:
++
++ <LANGUAGE=English><P><COLOR=black><BGCOLOR=white><FONT=Albertus><FONTSIZE=12>
++ <FONTUNIT=point>A simple <A
++ HREF="http://www.perl.com"><GLOSSARY=perl><I>perl</I></A></GLOSSARY>
++ <GLOSSARY=statement>statement</GLOSSARY> looks something like
++ </FONT></LANGUAGE><LANGUAGE=perl><FONT=Courier><CODE><GLOSSARY=Print
++ SUBGLOSSARY=perl>print
++ <FONTSTYLE=italic><GLOSSARY=Scalar variable
++ SUBGLOSSARY=perl>$foo</FONTSTYLE></GLOSSARY>,
++
<FONTMOD=notypographerquote>"</FONTMOD>\n<FONTMOD=notypographerquote>"</FONTMOD>
++ </FONT></LANGUAGE><LANGUAGE=English><FONT=Albertus>.
++ Easy, isn't it?
++
++ Now, extract the text and HTML markup. A task made more difficult by the
++ fact that the <FONT> tags are actually display property metadata, not HTML
++ metadata.
This is even further complicated in that XML/SGML (and HTML moves towards
that as well) are both inband *and* out-of-band markup. Inband for
"structural" markup, out-of-band for rendering purposes (style sheets).
Properly written SGML/XML documents will not have markup for "bold" or
"italics" - after all, how do bold and italics sound? How useful is it
to look for italics when you're using a VT100? It might come with a
style sheet that suggests certain renderings for certain devices, but
those can be ignored or overruled.
Your example might look like:
<P><LANGUAGE=English>A simple <A HREF="http://www.perl.com">
<GLOSSARY class = important>perl</GLOSSARY></A> <GLOSSARY>statement</GLOSSARY>
looks something like <LANGUAGE=perl><CODE class = perl>
<GLOSSARY=Print SUBGLOSSARY=perl>print
<GLOSSARY class = variable SUBGLOSSARY=perl>$foo</GLOSSARY>,
〹\n〹</CODE></LANGUAGE>. Easy isn't?</LANGUAGE></P>
With metadata:
p.foreground: black
p.background: white
p.font: albertus
p.font-size: 12pt
glossary.{important,variable}.font-style: italics
code.perl.font: courier
Abigail