procmail
[Top] [All Lists]

Robust date code (year extraction)

2004-07-24 12:44:15
With dman's help, I've written the following code to extract the year
from a message.  Here's what I have so far:
  
  ARCHIVING=true
  
  # Extract the year from the date that the author
  # *claims* to have composed the message.
  #
  # Note: this recipe will result in a null value
  # for dates formed as XX-XX-XX.
  #
  :0
  * ^Date:.*\/(19|20)?[0-9][0-9][^a-z]+:
  * MATCH ?? ^^\/[^     ]+
  * 19^0 MATCH ?? ^^19..^^
  * 20^0 MATCH ?? ^^20..^^
  * 19^0 MATCH ?? ^^[^0].^^
  * 20^0 MATCH ?? ^^0.^^
  * MATCH ?? ^^.*(19|20)?\/[0-9][0-9]^^
  { STATED_YEAR = $=$MATCH }
  
  # Alternative approach to extracting year from Date: field
  #
  #STATED_YEAR=`formail -x "Date: " | sed -e 's/.* 
\([12]\{,1\}[90]\{,1\}[0-9][0-9]\) .*/\1/' \
  #                                       -e 's/^[^0][0-9]$/19&/g' \
  #                                       -e 's/^[0][0-9]$/20&/g'`
  
  # Extract the year from the date that the 
  # message was delivered to the last server.
  #
  :0
  * ^^From .*\/(19|20)[0-9][0-9]$
  { RECEIVED_YEAR = $MATCH }
  
  # If processing mail that is missing time stamps, 
  # trust the authors given date more.  
  #
  # (This is to circumvent Formail's inclusion the date 
  # of processing in the absense of a From_ field.)
  #
  :0
  * ARCHIVING ?? true
  { SELECTED_YEAR = ${STATED_YEAR:-$RECEIVED_YEAR} }
  
  # If processing new inbound non-digest mail, 
  # trust the servers date more.
  #
  :0 E
  { SELECTED_YEAR = ${RECEIVED_YEAR:-$STATED_YEAR} }

  # The $SELECTED_YEAR variable can be used to 
  # organize archives into more reasonable 
  # sized files or folders.

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>