xsl-list
[Top] [All Lists]

Re: [xsl] How can I preserve ASCII Encoding Character Sets?

2012-11-06 16:04:41
At 2012-11-06 16:54 -0500, Philip Vallone wrote:
I want to preserve   or   as    or  .

The character or the markup? I think you want to preserve the markup but your process is trying to preserve the character and is messing up the character set.

Currently, as an example,   will output into my resulting file as a space,

Actually, it is a non-breaking space (NBSP) and not a space.

but when the resulting xsl file is used to transform the xml file to FO it prints out a bad character "Å".

Yes, that happens when the stream is in UTF-8 but you've told your processor the stream is in US-ASCII.

I have nailed down the issue to when I convert my stylesheet into one.

How are you doing that conversion? If you use XSLT then the problem is not in that step but somewhere else.

I hope this explains my issue. I appreciate all the help.

If you use native XML tools to go from one XML file to another (in this case your piecemeal XSLT stylesheets to the aggregate XSLT stylesheet), then you won't have a problem.

If you use Java or some other programming language, which isn't native XML, then it is likely there that the problems are being introduced with the character set.

Reading the evidence you provide here, you are using an XML processor in another language to read the stylesheet, that processor is converting the numeric character reference into a Unicode character, your language is writing out the Unicode character as UTF-8, thus losing the markup of the numeric character reference, and the resulting file still says "US-ASCII" at the top while the string is encoded in UTF-8.

I suggest you use XSLT to aggregate your stylesheet fragments into a single stylesheet (which is what I do in the obfuscation post I made earlier), thus your end result will be in UTF-8, but the declaration at the top will indicate UTF-8. You will lose the numeric character reference as it will be replaced by the Unicode character, but this is fine because the declaration at the top of your output will indicate or imply UTF-8.

Then treat the aggregated stylesheet in your encrypt/decrypt process as an octet stream, not as a string of characters, thus avoiding any interpretation of UTF-8 on the way in or out. Or use strings if you can guarantee fidelity between your input and your output.

This should get around your characters being UTF-8 and your declaration being ASCII.

I hope this helps.

. . . . . . . . . . . . Ken


--
Contact us for world-wide XML consulting and instructor-led training
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/s/
G. Ken Holman                   mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--