There is an uncanny legacy of hackers (and typists) abusing certain
special characters. There is quoting `like this', which abuses the
grave as an opening quote and the apostrophe as a matching closing
quote. Read Donald E. Knuth (and others) on the subtleties of
typesetting in English and other languages, where French, German and
English roll their own w.r.t. quotes. If you have a dataset constantly
misusing some character, fix it with some simple tool, but don't blame
clean SW.
-W
On 8 April 2014 19:28, Ihe Onwuka <ihe(_dot_)onwuka(_at_)gmail(_dot_)com> wrote:
it and every other backtick in the dataset I am dealing with is a
mistyped quotation mark.
Exhibit 1
Aisha`s Song but is supposed to be referring to
http://www.imdb.com/title/tt1950067/
On Tue, Apr 8, 2014 at 6:20 PM, Michael Kay <mike(_at_)saxonica(_dot_)com>
wrote:
On 7 Apr 2014, at 17:07, Ihe Onwuka <ihe(_dot_)onwuka(_at_)gmail(_dot_)com>
wrote:
backticks match the \w regex class which does seem at odds with the
definition of that class.
You might call it a backtick, and misuse it as a kind of quotation mark, but
its proper Unicode name and intended semantics is "grave accent", and the \w
category includes all non-spacing diacriticals.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--