[Top] [All Lists]

Re: [xsl] XPath 2.0 Regex misunderstanding

2007-01-19 14:13:06
cknell(_at_)onebox(_dot_)com wrote:
I have a date element:


I'm trying to write an XPath 2.0 Regex to winnow some of the more obvious date 
format errors. I have tried for about a half-hour, and I admit to being stumped.

I have some trouble with understanding your "passing" and "failing" is about. However, if you are trying to remove the "more obvious date format errors", I believe your "matches(...)" needs to become a "not(matches(...))", since your regular expression is about inclusion, not exclusion.

That said, you can try the following (assuming American dates: MM/DD/YYYY) for matching any date, disallowing years > 2006 and allowing the format 1/2/2006:

<xsl:variable name="dates">
<!-- false dates --> <DATE>22/12/2006</DATE> <DATE>00/10/2000</DATE> <DATE>01/32/2006</DATE> <DATE>10/10/2007</DATE> <DATE>12/12/20006</DATE> </xsl:variable>

<xsl:variable name="date-regex">^(
   0?[1-9]|     <!-- 01-09 and 1-9 -->
   1[0-2]       <!-- 10, 11, 12 -->
   0?[1-9]|     <!-- 01-09 and 1-9-->
   [1-2]\d|     <!-- 10-20 -->
   3[01]        <!-- 30, 31 -->
   1\d{3}|      <!-- 1000-1999 -->
   200[0-6]     <!-- 2000-2006 -->

<xsl:for-each select="$dates/DATE">
   <xsl:value-of select="concat(., ': ')" />
<!-- add normalize-space, because of a bug
       in saxon prior to with leading space -->
   <xsl:value-of select="matches(.,
       normalize-space($date-regex), 'x')" />

This outputs:
07/18/2006: true
07/12/2006: true
09/25/2006: true
10/24/2006: true
10/18/2006: true
10/10/2006: true
1/2/2006: true
22/12/2006: false
00/10/2000: false
01/32/2006: false
12/12/20006: false

Here is the relevant part of the template:

<xsl:when test="matches(DATE,'[0-1][0-2]/[0-3][0-9]/2006')"><bad-date 

What your statement implies is: output "bad-date" node when:
1) a date month is in the range (00, 01, 02, 10, 11, 12)
2) a date day is in the range (00, 01,... 09, 10, 11,.... 19, 20, 21, .... 29, 30, 31, ... 39
3) the year is 2006.

Well, I don't know much of your calendar system, but I can hardly believe you consider a date as "00/39/2006" as being correct, so here's a part of your problem. I know from my own experience that regexing numeric values is a tricky business (and is: think strings, not numbers).

For an article I wanted to write for a long time, but still haven't, I created a template that helps in regexing numeric values. It will simply output the right regexes for you, if you give it a number:

my:regex-from-number('376', 0)
will give:

it requires some getting used to, but I recall that Jeffrey Friedl named this: enrolling the number, or something similar. For small numbers you can easily do it by hand, but it is still hard for many mere mortals. It is optimized for repeated digits (like 2006). The output regex works perfect. A few notes (if you plan to use it):

Leave out this part if you require a fixed number of digits. I.e.: 034 and 009. By default, 34, 9 etc are allowed.

The input number. Repeating the number is not necessary for making a bullet proof regular expression, but it made me feel good. The larger the maximum number you need to match, the easier it gets putting it there: you see instantly what number is being matched.

The rest speaks for itself, I believe. But call in anytime if you want some additional help. The expressions in the opening are taken from this template to ensure I did the right thing, however, I made them a bit more readable.

<xsl:function name="my:regex-from-number">
   <xsl:param name="number" />
   <xsl:param name="pos" />
   <xsl:variable name="digit1" select="substring($number, $pos, 1)" />
   <xsl:variable name="digit2" select="substring($number, $pos + 1, 1)" />
   <xsl:variable name="len" select="string-length($number)" />
<xsl:value-of select="
       if($len = $pos)
           then concat
if($pos - 1 le 1) then ''
                   else concat('{', $pos - 1, '}')
           if ($digit2 = '0')
           then concat
                   my:regex-from-number($number, $pos + 1)
else concat
if(xs:integer($digit2) - 1 = 0) then '0'
                   else concat('[0-', xs:integer($digit2) - 1, ']'),
if($pos + 1 = $len) then '|'
                       if($len - $pos - 1 = 1) then '\d|'
                       else concat('\d{', $len - $pos - 1, '}|'),
'&#xA;', substring($number, 1, $pos), my:regex-from-number($number, $pos + 1)
               )" />

XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>