Re: [xsl] Illegal xml chars

The purpose of using XML, or of using a standard at all, is that youknow that supplier and receiver understand the format and that you neednot worry about vendor-specific formats or deviations. XML is a veryfree language but the standard does dictate that when any document isnot well-formed (and an encoding problem means it isn't), that aprocessor *must* reject it with a fatal error. If you try to bypass thatit is like driving in a car with no breaks: some day you will hit a walland things will crash, and all you thought was that you were driving areal car... it at least looked like one ;)

If you cannot fix the source (i.e., some proprietary legacy home-breedXML-like format which you have to deal with regardless what a standarddictates) it is best to find an agreement with your source of whatexactly the difference are (or can be) and agree upon that as strict asyou can. Then, decide how to deal with it. Ideally in your situation,I'd choose for a single filter or a filter chain. Many existing workflowsystems have that, and if you don't, it's trivial to write one (butdon't use XSLT for it, because that expects XML, which you haven't got yet).

After you filter it and you transformed the wannabe XML into proper XMLyou can start by transforming it with XSLT. Without any hassle, really.

There's only other option I can think of, which will basically come downto the same thing in the end but maybe better extensible: write anencoding parser, call it "almost-utf8", register it, and set theencoding of your document to this home-breed encoding (<?xmlversion="1.0" encoding="almost-utf8" />. The encoding is just equal toany other UTF-8 except for these characters that you don't allow, whichyou map to a space or whatever.

But all these methods are far from perfect compared to fixing it at thesource. What is the use of using a BS (BackSpace) character in yourdocument anyway?


Cheers,
-- Abel Braaksma



Waqar Ali wrote:

Sorry.. do not want to drag this topic but setting CheckCharacters tofalse does not work.. Here what is written in the documentation:
"If the XmlReader is processing text data, it always checks that theXML names and text content are valid, regardless of the propertysetting. Setting CheckCharacters to false turns off character checkingfor character entity references."
No matter what I do parser does not like this character and I have nooption but to somehow take it out from the xml.
Thanks guys for your help.



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [xsl] Illegal xml chars, (continued) Re: [xsl] Illegal xml chars, Andrew Welch RE: [xsl] Illegal xml chars, Michael Kay Re: [xsl] Illegal xml chars, Alice Wei Re: [xsl] Illegal xml chars, Waqar Ali RE: [xsl] Illegal xml chars, Michael Kay Re: [xsl] Illegal xml chars, Martin Honnen Re: [xsl] Illegal xml chars, David Carlisle Re: [xsl] Illegal xml chars, Martin Honnen Re: [xsl] Illegal xml chars, Waqar Ali Re: [xsl] Illegal xml chars, Andrew Welch Re: [xsl] Illegal xml chars, Abel Braaksma <= Re: [xsl] Illegal xml chars, Colin Adams Re: [xsl] Illegal xml chars, B Tommie Usdin Re: [xsl] Illegal xml chars, David Carlisle RE: [xsl] Illegal xml chars, Michael Kay