Hallöchen!
JBryant(_at_)s-s-t(_dot_)com writes:
Here's a link that deals with the issue. It gives a decent set of
cases for programmatically identifying whether a period ends a
sentence.
http://bulba.sdsu.edu/~malouf/ling571/17handout.pdf
This is an interesting text; however, I need a markup solution.
The three best alternatives so far:
* Treat every dot as an end-of-sentence unless it is immediately
followed by an <neos/> ("not end-of-sentence").
* Mark abbreviation dots, if followed by whitespace, with an
immediately following ​ (zero width space). [It would be
prettier to mark end-of-sentence dots this way, but this would be
much more invasive.]
* Mark abbreviations with <abbrev>e.g.</abbrev>. The cleanest
solution, but in my special case *much* more difficult to
implement than the other two, because I have an input stream to
convert to XML, and when I see the dot it's already to late for
inserting a tag.
Tschö,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--