xsl-list
[Top] [All Lists]

Re: [xsl] Parse a date - exslt:parse-date in Saxon 6

2014-10-22 11:35:22

Following a follow-up question about the if-condition-return idiom I managed to 
get something working sufficient for my requirements.

As promised here's some XSL 1 code that can take a date expressed in any one of 
a number of formats and write the result in a given format.  (XSL 1 because it 
fits in a DocBook customization, which needs to be XSLT 1)
The formats available are quite a limited list, but sufficient for my purposes.

As it's XSL 1 there's no regular expression usage.  The result is more verbose 
than I'd expect to be able to achieve in a language I'm more familiar with (eg 
C++).

The first template 'format.meta.date' calls the two templates for the two 
general sets of formats supported.
'parse.date.1' handles dates with alphabetic months.
'parse.date.2' handles numeric months.
All dates are in UK/European format (d/m/y).  If you want US (m/d/y) it should 
be straightforward enough to change.
The output is a date in one of epub3's required formats -  YYYY, YYYY-MM, 
YYYY-MM-DD.
Two digit years are presumed to be in the past.  They are considered to be 21st 
century if lower than the current year mod 100, 20th century if higher.

The original had some xsl:message lines in to aid debugging - I may have 
mis-edited them in in the course of editing for this message, apologies if so.


Regards,
Richard.


<!-- EPUB3 meta date should be of the form:  YYYY, YYYY-MM or YYYY-MM-DD -->
<xsl:template name="format.meta.date">
  <xsl:param name="string" select="''"/>
  <xsl:param name="node" select="."/>

  <!--
   A quick search has shown the following formats in use: 28 April 2009, 19 
November 2003, 10 December 2003,
   16/05/2012, 10/06/2014, 22/7/2010, 12/8/2010, 31 Mar 2011, 09 Dec 2010, 04 
Nov. 09, 29 Oct. 09, 14 Oct. 09, Feb 09

   Categorizing as follows (after normalize-space):
   "dd? mmm(m{0,6}).? yy(yy)?"
   "dd?/mm?/tt(yy)?"
   "mmm(m{0,6}).? yy(yy)?"
   Though XSLT 1 doesn't include regular expressions, so can't use it like this.
  -->
  <xsl:variable name="normalized"    select="translate($string, '0123456789', 
'##########')"/>
  <xsl:variable name="date.ok">
    <xsl:choose>
      <xsl:when test="string-length($string) = 4 and        $normalized = 
'####'">1</xsl:when>
      <xsl:when test="string-length($string) = 7 and        $normalized = 
'####-##'">1</xsl:when>
      <xsl:when test="string-length($string) = 10 and      $normalized = 
'####-##-##'">1</xsl:when>
      <xsl:when test="string-length($string) = 10 and      $normalized = 
'####-##-##'">1</xsl:when>
      <xsl:otherwise>0</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
 <!-- It isn't one of the permitted formats.   See if we can parse it as one of 
our own formats. -->
 <xsl:variable name="string.1">
   <xsl:call-template name="parse.date.1" >  <xsl:with-param name="string" 
select="$string" />   </xsl:call-template>
 </xsl:variable>
 <xsl:variable name="string.2">
   <xsl:call-template name="parse.date.2" >  <xsl:with-param name="string" 
select="$string" />   </xsl:call-template>
 </xsl:variable>
 <xsl:variable name="new.string">
   <xsl:choose>
    <xsl:when test="string-length( $string.1 ) > 0" >    <xsl:value-of 
select="$string.1"/>  </xsl:when>
    <xsl:when test="string-length( $string.2 ) > 0" >    <xsl:value-of 
select="$string.2"/>  </xsl:when>
    <xsl:otherwise>
     <xsl:message>
      <xsl:text>WARNING: wrong metadata date format: '</xsl:text>
      <xsl:value-of select="$string"/>
      <xsl:text>' in element </xsl:text>
      <xsl:value-of select="local-name($node/..)"/>
      <xsl:text>/</xsl:text>
      <xsl:value-of select="local-name($node)"/>
      <xsl:text>. It must be in one of these forms: </xsl:text>
      <xsl:text>YYYY, YYYY-MM, YYYY-MM-DD,</xsl:text>
      <xsl:text>DD MMM(...) (YY)YY, DD/MM/(YY)YY.</xsl:text>
     </xsl:message>
     <xsl:value-of select="''" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <!-- return the string anyway -->
 <xsl:value-of select="$new.string"/>
</xsl:template>

<xsl:template name="parse.date.1">
  <xsl:param name="string" select="''"/>
  <!--   Parse the following formats.
   "dd? mmm(m{0,6}).? yy(yy)?"
   "mmm(m{0,6}).? yy(yy)?"
  -->
 <!--  Months have three (May) to nine (September) letters.  Optional dot. -->
 <xsl:variable name="normalized"  select="translate($string, '0123456789', 
'##########')"/>
 <!-- normalize spaces. So "  Dec   96  ",  " 6  dec.   96  " etc all become 
"Dec 96" or "6 dec. 96" -->
 <xsl:variable name="normalized2"       select="normalize-space($normalized)"/>
 <!-- force to lower case -->
 <xsl:variable name="normalized3"   select="translate($normalized2, 
'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz' )"/>
 <!-- strip numerics. Giving "may ", " dec. " etc. -->
 <xsl:variable name="normalized4"   select="translate($normalized3, '#', '' )"/>
 <!-- normalize spaces again. Giving "may", "dec." -->
 <xsl:variable name="month-raw"   select="normalize-space($normalized4)"/>
 <!-- remove trailing dot, if present. Giving "may", "sept" etc -->
 <xsl:variable name="month-dotless" >
   <xsl:choose>
    <xsl:when test="substring($month-raw, string-length($month-raw), 1) = '.'">
      <xsl:value-of select="substring($month-raw, 1, string-length($month-raw) 
- 1)" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$month-raw" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>

 <!-- categorize. alphabetics become '%' -->
 <xsl:variable name="normalized7" select="translate($month-dotless, 
'abcdefghijklmnopqrstuvwxyz', '%%%%%%%%%%%%%%%%%%%%%%%%%%' )"/>
 <!-- By this point we have month names in isolation, without dots, length as 
given.    So expecting '%' only, three to nine times. -->
 <xsl:variable name="normalized8"   select="translate($normalized7, '%', '' )"/>
 <!-- cleared alphabetics, so expect nothing left. -->
 <xsl:variable name="date.ok.1">
   <xsl:choose>
    <xsl:when test="string-length($normalized8) = 0">true</xsl:when>
    <xsl:otherwise>false</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
<!-- <xsl:if test="$date.ok.1 = false">
   <xsl:message>
    <xsl:text>WARNING: unrecognized month (non-alphabetics): '</xsl:text>
    <xsl:value-of select="$month-dotless"/>
    <xsl:text>' in '</xsl:text>
    <xsl:value-of select="$string"/>
    <xsl:text>'.</xsl:text>
   </xsl:message>
 </xsl:if>-->
 <!-- check range of lengths. -->
 <xsl:variable name="date.ok.2">
   <xsl:choose>
    <xsl:when test="string-length($normalized7) >= 3 and 
string-length($normalized7) &lt;= 9">1</xsl:when>
    <xsl:otherwise>0</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <!-- extract three letter prefix of month name.
   month-dotless has the month in lower case, whatever length it was given. -->
 <xsl:variable name="normalized9" select="substring($month-dotless, 1, 3)" />
 <!-- check three letter version is valid.   Look it up in the reference set. 
-->
 <xsl:variable name="months">janfebmaraprmayjunjulaugsepoctnovdec</xsl:variable>
 <xsl:variable name="month-valid" select="contains($months, $normalized9)" />
 <!-- Now we're saying "ok we found it, but what was the index we found it at. 
-->
 <xsl:variable name="month-before" select="substring-before($months, 
$normalized9)" />
 <xsl:variable name="month-index" select="(string-length( $month-before ) div 
3) +1" />
 <xsl:variable name="month-name">
   <xsl:choose>
    <xsl:when test="$month-index = 1">january</xsl:when>
    <xsl:when test="$month-index = 2">february</xsl:when>
    <xsl:when test="$month-index = 3">march</xsl:when>
    <xsl:when test="$month-index = 4">april</xsl:when>
    <xsl:when test="$month-index = 5">may</xsl:when>
    <xsl:when test="$month-index = 6">june</xsl:when>
    <xsl:when test="$month-index = 7">july</xsl:when>
    <xsl:when test="$month-index = 8">august</xsl:when>
    <xsl:when test="$month-index = 9">september</xsl:when>
    <xsl:when test="$month-index = 10">october</xsl:when>
    <xsl:when test="$month-index = 11">november</xsl:when>
    <xsl:when test="$month-index = 12">december</xsl:when>
   </xsl:choose>
 </xsl:variable>
 <!-- We now have the full name of the month.   Check that if a longer form was 
given it matches the full name. -->
 <xsl:variable name="month-valid-full" select="$month-dotless = substring( 
$month-name, 1, string-length( $month-dotless ) )" />
 <xsl:variable name="month-string-3" select="format-number( $month-index, '00' 
)" />
 <!-- Now get the day and year. -->
 <!-- force to lower case -->
 <xsl:variable name="normalized10"    select="translate($string, 
'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz' )"/>
 <xsl:variable name="day-string" select="substring-before( $normalized10, 
$month-dotless )" />
 <xsl:variable name="day-string-2" select="normalize-space( $day-string )" />
 <xsl:variable name="day-num" select="number( $day-string-2 )" />
 <xsl:variable name="day-string-3" select="format-number( $day-num, '00' )" />
 <xsl:variable name="year-string" select="substring-after( $normalized10, 
$month-raw )" />
 <xsl:variable name="year-string-2" select="normalize-space( $year-string )" />
 <xsl:variable name="this-year" select="date:year()" />
 <xsl:variable name="this-year-in-century" select="$this-year mod 100" />
 <xsl:variable name="year-num" select="number( $year-string-2 )" />
 <xsl:variable name="year-num-2">
   <xsl:choose>
    <xsl:when test="$year-num &lt; $this-year-in-century">
      <xsl:value-of select="$year-num + 2000" />
    </xsl:when>
    <xsl:when test="$year-num > $this-year-in-century and $year-num &lt; 100">
      <xsl:value-of select="$year-num + 1900" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$year-num" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <xsl:variable name="year-string-3" select="format-number( $year-num-2, '0000' 
)" />
 <!-- Return something. -->
 <xsl:variable name="return" >
   <xsl:if test="$date.ok.1 and $date.ok.2 and $month-valid and 
$month-valid-full">
    <xsl:choose>
      <xsl:when test="string-length( $day-string ) > 0">
       <xsl:variable name="result">
         <xsl:value-of select="$year-string-3" />-<xsl:value-of 
select="$month-string-3" />-<xsl:value-of select="$day-string-3" />
       </xsl:variable>
       <xsl:value-of select="$result" />
      </xsl:when>
      <xsl:when test="string-length( $day-string ) = 0">
       <xsl:variable name="result">
         <xsl:value-of select="$year-string-3" />-<xsl:value-of 
select="$month-string-3" />
       </xsl:variable>
       <xsl:value-of select="$result" />
      </xsl:when>
    </xsl:choose>
   </xsl:if>
 </xsl:variable>
 <xsl:value-of select="$return" />
</xsl:template>

<xsl:template name="parse.date.2">
  <xsl:param name="string" select="''"/>
  <!--
   Parse the following formats.
   "dd?/mm?/tt(yy)?"
   ie.
   dd/mm/yyyy
   mm/yyyy
     (where dd may be d, mm may be m, yyyy may be yy)
  -->
 <!-- Turn numbers to # and remove all spaces. -->
 <xsl:variable name="normalized"  select="translate($string, '0123456789 ', 
'##########')"/>
 <!-- should now be '#/#/##' '#/#/####' '#/##/##' '#/##/####' '##/#/##' 
'##/#/####' '##/##/##' '##/##/####' -->
 <!-- strip numerics. Giving "//" or "/" -->
 <xsl:variable name="normalized2"   select="translate($normalized, '#', '' )"/>
 <!-- cleared numerics, so expect "//". -->
 <xsl:variable name="date.check.1">
   <xsl:choose>
    <xsl:when test="$normalized2 = '//'">2</xsl:when>
    <xsl:when test="$normalized2 = '/'">1</xsl:when>
    <xsl:otherwise>0</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
<!-- <xsl:if test="$date.check.1 = 0">
   <xsl:message>
    <xsl:text>WARNING: unrecognized format (n/n/n or n/n): '</xsl:text>
    <xsl:value-of select="$normalized2"/>
    <xsl:text>' in '</xsl:text>
    <xsl:value-of select="$string"/>
    <xsl:text>'.</xsl:text>
   </xsl:message>
 </xsl:if>-->
 <!-- strip slashes. Giving "####" to "########".  -->
 <xsl:variable name="normalized3"   select="translate($normalized, '/', '' )"/>
 <!-- check range of lengths. -->
 <xsl:variable name="date.ok.2">
   <xsl:choose>
    <xsl:when test="string-length($normalized3) >= 4 and 
string-length($normalized3) &lt;= 8">true</xsl:when>
    <xsl:otherwise>false</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <xsl:variable name="before-slash-1" select="substring-before($string, '/')" />
 <xsl:variable name="after-slash-1" select="substring-after($string, '/')" />
 <xsl:variable name="before-slash-2" select="substring-before($after-slash-1, 
'/')" />
 <xsl:variable name="after-slash-2" select="substring-after($after-slash-1, 
'/')" />
  <!-- Work out which is which, ie dd/mm/yy(yy) or mm/yy(yy). -->
  <xsl:variable name="year-num" >
 <xsl:choose>
   <xsl:when test="($date.check.1 = 2) and $date.ok.2">
    <!-- dd/mm/yy(yy) -->
    <xsl:variable name="result">
      <xsl:value-of select="number( $after-slash-2 )" />
    </xsl:variable>
    <xsl:value-of select="$result" />
   </xsl:when>
   <xsl:when test="($date.check.1 = 1) and $date.ok.2">
    <!-- mm/yy(yy) -->
    <xsl:variable name="result">
      <xsl:value-of select="number( $after-slash-1 )" />
    </xsl:variable>
    <xsl:value-of select="$result" />
   </xsl:when>
 </xsl:choose>
  </xsl:variable>
  <xsl:variable name="month-num" >
   <xsl:choose>
     <xsl:when test="($date.check.1 = 2) and $date.ok.2">
      <!-- dd/mm/yy(yy) -->
      <xsl:variable name="result">
        <xsl:value-of select="number( $before-slash-2 )" />
      </xsl:variable>
      <xsl:value-of select="$result" />
     </xsl:when>
   <xsl:when test="($date.check.1 = 1) and $date.ok.2">
    <!-- mm/yy(yy) -->
    <xsl:variable name="result">
      <xsl:value-of select="number( $before-slash-1 )" />
    </xsl:variable>
    <xsl:value-of select="$result" />
   </xsl:when>
 </xsl:choose>
  </xsl:variable>
 <xsl:variable name="this-year" select="date:year()" />
 <xsl:variable name="this-year-in-century" select="$this-year mod 100" />
 <xsl:variable name="year-num-2">
   <xsl:choose>
    <xsl:when test="$year-num &lt; $this-year-in-century">
      <xsl:value-of select="$year-num + 2000" />
    </xsl:when>
    <xsl:when test="$year-num > $this-year-in-century and $year-num &lt; 100">
      <xsl:value-of select="$year-num + 1900" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$year-num" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <xsl:variable name="day-num" >
  <xsl:choose>
    <xsl:when test="($date.check.1 = 2) and $date.ok.2">
     <!-- dd/mm/yy(yy) -->
     <xsl:variable name="result">
       <xsl:value-of select="number( $before-slash-1 )" />
     </xsl:variable>
     <xsl:value-of select="$result" />
      </xsl:when>
      <xsl:when test="($date.check.1 = 1) and $date.ok.2">
     <!-- mm/yy(yy) -->
     <xsl:variable name="result" select="0" />
     <xsl:value-of select="$result" />
    </xsl:when>
  </xsl:choose>
 </xsl:variable>
 <xsl:variable name="day-string" select="format-number( $day-num, '00' )" />
 <xsl:variable name="month-string" select="format-number( $month-num, '00' )" />
 <xsl:variable name="year-string" select="format-number( $year-num-2, '0000' )" 
/>
 <!-- Return something. We've already worked out which is year and which is the 
num. -->
 <xsl:variable name="return" >
   <xsl:choose>
    <xsl:when test="($date.check.1 = 2) and $date.ok.2 and $day-num > 0">
      <!-- dd/mm/yy(yy) -->
      <xsl:variable name="result">
     <xsl:value-of select="$year-string" />-<xsl:value-of 
select="$month-string" />-<xsl:value-of select="$day-string" />
      </xsl:variable>
      <xsl:value-of select="$result" />
    </xsl:when>
    <xsl:when test="($date.check.1 = 1) and $date.ok.2 and $day-num = 0">
      <!-- mm/yy(yy) -->
      <xsl:variable name="result">
     <xsl:value-of select="$year-string" />-<xsl:value-of 
select="$month-string" />
      </xsl:variable>
      <xsl:value-of select="$result" />
    </xsl:when>
   </xsl:choose>
 </xsl:variable>
 <xsl:value-of select="$return" />
</xsl:template>





Richard Kerry
BNCS Engineer, SI SOL Telco & Media Vertical Practice
T: +44 (0)20 3618 2669
M: +44 (0)7812 325518
G300, Stadium House, Wood Lane, London, W12 7TA
richard(_dot_)kerry(_at_)atos(_dot_)net


This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Atos group liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.

________________________________________
From: Martin Honnen martin(_dot_)honnen(_at_)gmx(_dot_)de 
[xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com]
Sent: 12 June 2014 19:14
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Parse a date - exslt:parse-date in Saxon 6

Kerry, Richard richard(_dot_)kerry(_at_)atos(_dot_)net wrote:

I can see that a suitable parser function (parse-date) is defined in
Exslt but it isn't clear whether it is already available to me or how to
get it into use if not.  Actually according to the exslt documentation
it is definitely not available in any XSLT processor but there
are JavaScript and Msxsl implementations available.

Can someone advise how I can get this to work ?

Can I get Saxon 6 to call a JavaScript function ?

As far as I know there is no way with Saxon 6 to use Javascript to
implement extension functions.

Atos, Atos Consulting, Worldline and Canopy The Open Cloud Company are trading 
names used by the Atos group. The following trading entities are registered in 
England and Wales: Atos IT Services UK Limited (registered number 01245534), 
Atos Consulting Limited (registered number 04312380), Atos Worldline UK Limited 
(registered number 08514184) and Canopy The Open Cloud Company Limited 
(registration number 08011902). The registered office for each is at 4 Triton 
Square, Regent’s Place, London, NW1 3HG.The VAT No. for each is: GB232327983.

This e-mail and the documents attached are confidential and intended solely for 
the addressee, and may contain confidential or privileged information. If you 
receive this e-mail in error, you are not authorised to copy, disclose, use or 
retain it. Please notify the sender immediately and delete this email from your 
systems. As emails may be intercepted, amended or lost, they are not secure. 
Atos therefore can accept no liability for any errors or their content. 
Although Atos endeavours to maintain a virus-free network, we do not warrant 
that this transmission is virus-free and can accept no liability for any 
damages resulting from any virus transmitted. The risks are deemed to be 
accepted by everyone who communicates with Atos by email.
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>