xsl-list
[Top] [All Lists]

Re: [xsl] character entities

2008-11-03 03:54:48
Hi,

I'm having a wee spot of bother with character entities.

It's character encoding rather than character entities

This data is then put into fields within a Zend Search Lucene index, via
php (that's why I first "flattened" it).

This index data is then queried (again via php) and the results sent
to/rendered by a browser.

If I put &#241_; (minus the underline character, which I've added so
this email is not mis-parsed) in my original xml, and using
encoding="iso-8859-1" for it and my xsl stylesheet, then my xsl
transforms that into a (Spanish) n character with a tilde on top: ñ.

If I tell ZSL to index fields using 'iso-8859-1' encoding, my Spanish n
becomes: ñ. If I tell ZSL to index fields using 'utf-8' encoding, my
Spanish n becomes: ñ.

These sorts of issues are nearly always a case of writing in one
encoding and reading in another, and you just need to track down where
the reading and writing is happening - it could be a string to byte
conversion in your code, or parsing of the markup in the browser, or
even the text viewer you are using to check the output (such as the
eclipse output window)

I believe I need to prevent all parsers bar the browser at the end from
parsing my "special characters", right? But how?

Not really, that's just a way of bypassing encoding problems and
doesn't address the underlying issue.

Latest effort: I tried using encoding="utf-8" for all levels: my original
xml, my xsl output, and the input to ZSL's index, & I also saved my xml file
as utf-8 format, and used the Spanish n inside my xml, i.e. ñ rather than
ñ. Doing that, the Spanish n was preserved through the xsl output, but
ZSL stores it as: ñ, & that's also how my browser displays it.

Ahh ok, well that's the right approach, you just need to examine the
code at every step and isolate that point where it's going wrong -
you've got to the output of transform ok, next is to carefully step
through what happens between that and "ZSL".

Using the actual n-tilde charactor or the character reference 241
shouldn't make any different, by the way...


cheers
-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>