Figured it out a little while ago.
Off Topic
I'm using ant
<replaceregexp match="[\cX]" replace="" flags="g">
Thanks to Abel and Michael for your input.
Mario
Quoting Abel Braaksma <abel(_dot_)online(_at_)xs4all(_dot_)nl>:
Mario Madunic wrote:
the character is and its a control character
0x18 CAN
Unfortunately, that says it all. Control characters are not allowed in
UTF-8 and as a result, are not allowed in XML, when the encoding is
UTF-8 (making XML not well-formed)
the error message I recieve is
SXXP0003: Error reported by XML parser: Illegal XML character: .
This is indeed illegal. The other day I accidentally used , which
is also illegal (I had it mistaken for a tab character, x09, which *is*
legal) .
I've tried using ANT to clean it out but with no luck using native2ascii
or
escapeunicode
Won't help either. Escaping these characters will not help. But you are
on the right track: use a filter to remove this character, or replace it
with something useful. I use a filter to get Micrososft *.msg format,
which has some useful lines, but the rest are control characters and
other illegal data. Here's what it might look like when you'd resort to
using Ruby (you can call it from Ant if you like), see www.ruby-lang.org.
(spoiler warning: this is off-topic and only marginally related to xslt)
# create working dir
if not FileTest::exist?('trimmed')
Dir.mkdir('trimmed')
end
Dir.entries(".").each do |fn|
if fn =~ /\.yourextension/
# open file and set it to binmode
file = File.new(fn)
file.binmode
# read complete file contents and scan it
newfile = File.new("trimmed/#{fn}.txt", 'w')
file.gets(nil).scan(/[^\x18]+/m) do |found|
newfile.puts(found);
end
end
end
Just replace "yourextension" with the extension of your file and replace
"trimmed" with an output dirname of your choice. Replace '.txt" with
whatever extension you would like yourself. It runs through the current
directory and copies all files to the "trimmed" directory, with one
change: the x18 character is removed.
Of course, you can use Perl, a DOS Batch file (takes some practice),
Bash, VBScript, PHP, Grep, Awk or any other tool you'd prefer.
HTH,
Cheers,
Abel Braaksma
http://abelleba.metacarpus.com
Can this be done or do I need to ask the client to remove it from their
data,
which might not be an option?
Any help or insight would be greatly appreciated.
Marijan Madunic
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--