xsl-list
[Top] [All Lists]

Re: [xsl] SGML to XML

2009-02-25 18:12:18
On 25 Feb 2009, at 07:57 , Graeme Kidd wrote:

> Is it possible to convert a SGML file to XML using XSLT?

> Currently I am using AltovaXML free XSLT 2.0 engine:
> http://www.altova.com/altovaxml.html

> I have a feeling it might not be able to read in SGML so does
> any one have any suggestion on applications that can help me
> convert SGML to XML? If applications like that do exist will
> they use XSLT to transform or am I looking at some other
> language?

It's not completely clear what exactly you want to do; the
precise answer to your question may depend on which of two
similar questions you really mean to be asking.

(1) If your essential goal is to convert SGML documents to XML,
several tools exist that can help.  None of them use XSLT -- or
any other language! -- to specify the transformation, because
SGML and XML share the same basic notions of tree structure.  The
transformations they perform are purely mechanical ones:
supplying omitted end-tags, supplying omitted delimiters of other
sorts, normalizing the case of names, expanding entity
references, that kind of thing.

The one tool in this class I've used myself in the past is James
Clark's SX, the precursor to the osx which Kevin Bray has already
pointed to.

  http://www.jclark.com/sp/sx.htm

The main problem I had with SX was that its method of normalizing
names was to uppercase all of them, regardless of the case in
which the name was given when declared.  Since I was unable to
put up with the resulting ugliness, I had to devise and run
another processing step to re-normalize things using the correct
case.  (Later, I found a stylesheet from Wendell Piez that makes
this easy; unfortunately, I can't find it on the Web now.
Wendell, it *is* on the Web, isn't it?)

(2) If, on the other hand, your problem is that you have some
SGML documents, and you plan to continue to have them in SGML,
and you really just want to process them with XSLT or, you want
to convert them into an XML which is not isomorphic to the SGML,
then -- well, strictly speaking, Ken Holman's statement that XSLT
requires XML input is an oversimplification.  What the spec says
is:

    A transformation expressed in XSLT describes rules for
    transforming a source tree into a result tree.

That is, XSLT requires input in the form of instances of the XSLT
data model, and produces output similarly structured.  By far the
most common way to provide this is to parse an XML document.  But
the spec doesn't require that: any method of producing an
instance of the data model will do.

In principle, if you could lay your hands on an SGML parser that
can produce SAX events, you could apply XSLT to SGML without
trouble, and use any SAX-consuming XSLT processor (e.g. Saxon or
Xalan, to name two) to process the SGML data.  Or you could build
a DOM and pass it to any XSLT processor that accepts documents in
DOM form.

In practice, I discover that Google can't tell me about any SGML
parser on the Web that says it can emit SAX events.  This
surprises me a bit: the thing I wanted most, in the first ten
years I used SGML, was something very much like XSLT.  Do all the
people currently using SGML find themselves so happy with their
existing tools that they never want to use XSLT?  Wow; better
tools than I remember having :)

I wonder if there would be any market for a tool that ran SP on
SGML input and produced SAX output, so that it could be run as a
front-end to Saxon.

I hope this helps.

Michael Sperberg-McQueen



--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>