xsl-list
[Top] [All Lists]

Re: [xsl] Confused about entities

2006-03-14 08:03:21
Hey Gary,

At 09:00 AM 3/14/2006, you wrote:
I'm currently outputting as XML and it
should only be the last stage in the chain that outputs as XHTML. The
issue seems to be that the input includes declared entities that
nothing on the later part of the chain understands. Therefore I want
the unicode entity instead so   rather than   for example.

If the input includes declared entities, where are the declarations?

It's true that HTML in the wild often leaves these declarations out. Nonetheless, if the input is to be parsed as XML, these declarations must be available. It could be that your input isn't even syntactically correct, well-formed XML (which is a way of saying things redundantly over again, as it's not XML until it follows the rules), in which case you need to start asking the questions Andrew has posed, and considering tools to fix the syntax. (It's no fun by hand.) On the other hand, maybe it's only the entity declarations that are missing, in which case providing your input with a DTD or DTD fragment that contains those declarations will be sufficient.

If the entity declaration is available, the document can be parsed and presented to the XSLT engine for transformation. If it can't, there's nothing XSLT can do to help.

(Accordingly, it's not an XSLT question, but a basic XML question: you'd have this problem even if you weren't using XSLT.)

Should the stylesheet automatically do this? Is there some way I can
force a text() catch in the template to convert the characters for me?

Nope. An analogy: that's like putting the cake in the oven before the batter is mixed. You can't expect to put flour, eggs, sugar etc. straight into the oven and get "cake". Fortunately, with XSLT you won't get a mess of baked flour and eggs and melted sugar -- but you will get the error message you're seeing.

Hint: the particular declaration you're looking for looks like:

<!ENTITY nbsp   "&#160;"> <!-- no-break space = non-breaking space,
                                  U+00A0 ISOnum -->

In the XHTML DTD, it's to be found in the xhtml-lat1.ent file.

But if you put the DOCTYPE declaration at the top of your input

<!DOCTYPE html [
<!ENTITY nbsp   "&#160;"> ]>

-- and if everything else is good (all other entities are declared, syntax is correct) -- you'll be okay. (This is for testing. If you have more than one input document you'll want to call the DTD in through an external identifier, either SYSTEM or PUBLIC depending on your parser and environment.)

Note that how these characters are expressed in the *output* is not addressed here. You can figure that out once you've got your files parsing.

Oh and since I forgot to mention I'm using Saxon 8 and XSLT 2.0.

That's good; it gives you a number of ways of controlling how those characters appear in the output. But you have to get them in first.

Cheers,
Wendell



======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--