xsl-list
[Top] [All Lists]

[xsl] possible workarounds to process files with invalid character encoding ...

2008-12-12 16:14:16
Hello,

I'm trying to transform a textfile with xslt using the unparsed-text and
tokenize functions. Unfortunately the text file consists of characters
which are encoded with a non Unicode compliant encoding scheme. So as
expected my Saxon Processor (version 9.1.0.3 Basic) shows me a
*MalformedInputException *when I want to parse the file.

Now my question is if there are any "workarounds" to make Saxon process
the file anyway. Maybe by:

(1) Writing a sort of plugin that let's Saxon support also non Unicode
compliant encodings;

(2) By adding in some way Metadata to the input file which Saxon or
another XSLT Parser can handle and that specifies a mapping of the used
character encodings to the appropriate code points of a Unicode
compliant encoding.

And if there exists such a workaround is it even worth trying to
implement it or would someone be better of preprocessing the file with a
custom Java-Program or by even trying to modify the program that creates
such text-files in such a way that it uses a Unicode-compliant encoding
scheme rather than it's own custom one?

What are your opinions?

Best Regard

Matthias Einbrodt




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>