xsl-list
[Top] [All Lists]

Re: [xsl] A regular expression for the content of any processing-instruction

2012-02-23 07:49:54
Costello, Roger L. wrote:
Hi Folks,

I created a regex for the content of any PI.

Is my regex correct?

Here is the structure of the content of any PI:

1. Zero or more whitespace characters. This is expressed as: \s*

2. One or more XML name characters. This is expressed as: \c+

3. Zero or more whitespace characters. This is expressed as: \s*

4. The equals sign. This is expressed as: =

5. Zero or more whitespace characters. This is expressed as: \s*

6. Either a single- or double-quote character. This is expressed as: ["']

7. One or more characters (any kind of character). This is expressed as: .+

    Note: the period allows any character. That's not correct. What is correct?

8. Either a single- or double-quote character. This is expressed as: ["']

9. Repeat (1) - (8) one or more times. This is expressed as: ( ... )+

10. Zero or more whitespace characters. This is expressed as: \s*

Here is the resulting regex:

(\s*\c+\s*=\s*["'].+["'])+\s*

Do you agree?

No, here is an example that stuffs markup into a PI

<?xml version="1.0" encoding="utf-8" ?>
<?test <foo><bar>foobar</bar></foo> ?>
<root/>

The XML specification does not mandate any pseudo attribute syntax for the content of a PI, as your regex seems to expect.

--

        Martin Honnen --- MVP Data Platform Development
        http://msmvps.com/blogs/martin_honnen/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--