My verdict: If the 'lt' of Michael was on purpose, I still
want to grant him the "Best Original Software Snippet Based
On Any XXX* Language" ;-)
I think the original problem wasn't especially well specified, and I was
well aware that retaining all the characters below 127 while losing those
above was a pretty crude cutoff. In the light of that, the decision whether
to keep or lose 127 itself is neither here nor there. Almost certainly a
better solution solution is to discard only the characters in particular
Unicode groups, which should be possible to achieve using replace() with
appropriately selected regular expressions. The basic idea I was trying to
propose was using normalize-unicode to translate into decomposed normal form
and then discarding modifier characters, and I think that's basically a
sound approach.
In fact a better solution might be
replace(normalize-unicode($in, 'NFKD'), '\P{Mn}', '')
but I'm sure that could be improved further.
Michael Kay
http://www.saxonica.com/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--