Hello group,
I have a content set about books which I curretnly transfrom. I have solved
most of the questions here with all your kind help. Also I have to clean
some of the text from weired characters that appears to have some pattern. I
am not sure, but perhaps I show you some examples and some of you could know
what that could be or if this is some kind of encoding problem. Perhaps
there is something I can do in general to fix the entire collection by
changing the encoding of the data (I am not an expert in that therefore
please take me applogies if I talk rubbish here).
Effect 1: There are "q" letters in front of sentences (e.g. "...it has to
take place in the Old West. qAnd make sure that ...". I have this effect
uncontrolled at thousends of places all over the collection. I am not sure
if this is an encoding problem. Perhaps we can discuss here. However I think
I could deal with it by recognising it with XSLT and deleting this character
(after recognising it with a regular expression, of course). The regular
expression should have the following structure. First a dot followed by a
space (=> to recognise the beginning of a sentence), then a q (not capital
letter) and then any of the 26 capital letters. How can I do that in XSLT?
Effect 2: I have questionmarks at places where I would expect to find a
special character like ' or £ or $. Example: "...which was the Company???s
first drama production in ...". Second example: "??6 to ??8" in a price tag
where it should be "£6 to £8" or "$6 to $8".
What do you think about that?
Best Regards,
Karl
--
5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
+++ GMX - die erste Adresse f�r Mail, Message, More +++
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--