Hey Gary,
At 09:00 AM 3/14/2006, you wrote:
I'm currently outputting as XML and it
should only be the last stage in the chain that outputs as XHTML. The
issue seems to be that the input includes declared entities that
nothing on the later part of the chain understands. Therefore I want
the unicode entity instead so   rather than for example.
If the input includes declared entities, where are the declarations?
It's true that HTML in the wild often leaves these declarations out.
Nonetheless, if the input is to be parsed as XML, these declarations
must be available. It could be that your input isn't even
syntactically correct, well-formed XML (which is a way of saying
things redundantly over again, as it's not XML until it follows the
rules), in which case you need to start asking the questions Andrew
has posed, and considering tools to fix the syntax. (It's no fun by
hand.) On the other hand, maybe it's only the entity declarations
that are missing, in which case providing your input with a DTD or
DTD fragment that contains those declarations will be sufficient.
If the entity declaration is available, the document can be parsed
and presented to the XSLT engine for transformation. If it can't,
there's nothing XSLT can do to help.
(Accordingly, it's not an XSLT question, but a basic XML question:
you'd have this problem even if you weren't using XSLT.)
Should the stylesheet automatically do this? Is there some way I can
force a text() catch in the template to convert the characters for me?
Nope. An analogy: that's like putting the cake in the oven before the
batter is mixed. You can't expect to put flour, eggs, sugar etc.
straight into the oven and get "cake". Fortunately, with XSLT you
won't get a mess of baked flour and eggs and melted sugar -- but you
will get the error message you're seeing.
Hint: the particular declaration you're looking for looks like:
<!ENTITY nbsp " "> <!-- no-break space = non-breaking space,
U+00A0 ISOnum -->
In the XHTML DTD, it's to be found in the xhtml-lat1.ent file.
But if you put the DOCTYPE declaration at the top of your input
<!DOCTYPE html [
<!ENTITY nbsp " "> ]>
-- and if everything else is good (all other entities are declared,
syntax is correct) -- you'll be okay. (This is for testing. If you
have more than one input document you'll want to call the DTD in
through an external identifier, either SYSTEM or PUBLIC depending on
your parser and environment.)
Note that how these characters are expressed in the *output* is not
addressed here. You can figure that out once you've got your files parsing.
Oh and since I forgot to mention I'm using Saxon 8 and XSLT 2.0.
That's good; it gives you a number of ways of controlling how those
characters appear in the output. But you have to get them in first.
Cheers,
Wendell
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--