xsl-list
[Top] [All Lists]

RE: Entities: The worst of both worlds :-(

2003-10-10 12:20:35
From Zarella's 2 Oct 2001 email:

There is a way to process character entities, but it requires a bit of
hacking to get the XML parser to work for you. Take your Entity
declaration
files and create a new set that you will use just for transformation
purposes. Each entity will need to be modified to have the form:

<!ENTITY tilde "<ent>&amp;tilde;</ent>">

Now, this will create new <ent> elements in your XML file before it gets
to
the XSLT processor. So, in XSLT, you can now use a template rule as
follows:

<xsl:template match="ent">
<xsl:value-of disable-output-escaping="yes" select="text()"/>
</xsl:template>

This is very, very useful to me. I can now output XHTML with entity
references preserved. But the fun's not over yet. 

In addition to producing English XHTML, I need to output a set of Shift-JIS
encoded (Japanese) XHTML files (from Shift-JIS encoded Japanese XML source
provided by the translators).

l have some numeric entity references (&#nnnn;) in both my original XML
source, and also in my XSLT stylesheet, that are causing problems with this
Shift-JIS XML source.

The MSXML (v3.0) transformNodeToObject method I'm using to invoke the XSLT
stylesheet works without problem against the Shift-JIS encoded Japanese XML
source - I can view the resulting transformed, Shift-JIS encoded X(HT)ML -
but when I call the Save method on the XML object, it saves only up until
where the first &#nnnn; reference (such as &#160;) was, and reports an error
(Err.Number = 0; Err.Description is blank).  This might just be an MSXML
bug, I don't know. Any ideas?

Finally, using Zarella's tip means that the resulting XML document must now
refer to a doctype that defines these "preserved" entites (such as
&nbsp;)... (when I said the output of the XSLT was XHTML, what I really
meant was "something very similar to XHTML, but without the external DTD
reference")... so now this whole process relies on being connected to the
Web (so that the XHTML "http://..."; DTD reference can be resolved by MSXML,
so it's happy that the entities have definitions)... or otherwise I have to
insert a local file system-specific DOCTYPE SYSTEM DTD reference, which I'd
really like not to have... can someone throw me a lifeline, and point me in
the right direction of getting MSXML to validate an XML document when the
DOCTYPE refers to an "http://..."; address, but you're not net-connected?
(I've read that you can use the Add method to associate an XDR/XSD schema
file with a schema URI... ah, boy... mebbe I should go look for the W3C
XHTML XSD files... I could only find the DTD and .ENT files last time I
looked... this would probably mean upgrading to v4 or ditching MSXML, 'cos
v3 supports only XDR... starting to ramble now, time to hit Send.

Pearls of wisdom gratefully accepted.

Graham Hannington

P.S. For anyone out there who wants to try Zarella's tip, here are the
search'n'replace regular expressions I used (in jEdit) to hack the W3C XHTML
.ent files (a trivial, truly pitiful attempt at trying to "give something
back", I know):

Search expression:

<!ENTITY\s*([^\s]*)[^>]*>

Replace expression:

<!ENTITY $1 "<ent>&amp;$1;</ent>">

P.P.S. Off-topic (sorry), but I've seen similar queries on "appropriate"
discussion groups with only tumbleweeds for answers: does anyone know how to
make VBScript write Shift-JIS encoded files? (I can only "make" it do this
by employing XSLT and the XML Save method... VBScript's native
FileSystemObject seems limited to Unicode and ASCII... although I suspect
that, on a Japanese PC, it might default to Shift-JIS instead of ASCII.)

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>
  • RE: Entities: The worst of both worlds :-(, Graham Hannington <=