xsl-list
[Top] [All Lists]

Re: [xsl] analyze-string help?

2012-06-10 11:20:51
On Sun, Jun 10, 2012 at 12:04:58PM -0400, Syd Bauman scripsit:
I think maybe it worked because I had it at the end of the pattern
and then later added additional characters. So I think I went from
[A-Za-z0-9 -] to this [A-Za-z0-9 -,./]

It was accidental? And here I thought it was a clever way to catch
gnarly characters. The hyphen in the 2nd regexp means "from space
(U+0020) to comma (U+002C)", i.e. expresses a range that matches the
same characters [ !"#$%&'()*+,] matches. Many of these characters are
a pain to type into an XSLT regexp, and thus a range like this seemed
like a nice way to catch them.

Well, except that it's both subtle and clever, those banes of
maintainability.

One of the things I am very glad went into XSLT regular expressions are
the Unicode character categories; if you want (for example),
punctuation, it's "\p{P}", so I might write the provided atom definition
as:

[\p{L}\p{Nd}\p{P}]

("Unicode character category letters", "Unicode character category
numbers, subcategory digits", "Unicode character category punctuation".)

Upper-case P means "everything not", so you can neatly express things
like "\P{Pd}", "any character that is not some kind of dash".

In my ideal world the syntax would evolve so you could constrain the
categories -- "\p{Pd except '-'}", "any character that is some kind
of dash except for U+002D "hyphen-minus", for example -- since that
would make this even more useful for functions that take regular
expressions such as tokenize().

-- Graydon

-- 
Graydon Saunders        XML tools and processes for information delivery.
graydon(_at_)marost(_dot_)ca

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--