procmail
[Top] [All Lists]

Re: Creating a mbox from html files

2003-10-11 10:09:16
On Sat, 11 Oct 2003, Lukreme wrote:

I have a bunch of HTML files that I want to compile into a mbox file, 
providing a set of static headers, but extracting the Name and Date 
from the <title> tag of the file.

All the Title tags are in the form of <title>A name/dd-mmm-yy</title> 
and I would like to feed these files to formail and have it create a 
set of headers something like this:

I don't think you need formail for this.  A simple shell loop should
be sufficient (use of fgrep assumes each title is on a line by itself):

fgrep '<title>' *.html /dev/null |
while IFS='<>/' read filename tag subject date rest
do
  filename=${filename%:*}
  dd=${date%%-*}
  mmm=${date#*-}
  mmm=${mmm%-*}
  yy=${date##*-}
  if [ $yy -lt 70 ]; then yyyy=20$yy; else yyyy=19$yy; fi
  cat - $filename <<-EOF
        From staticaddress(_at_)domain(_dot_)com $mmm $dd 00:00:01 $yyyy
        To: someaddress(_at_)domain(_dot_)com
        Subject: $subject
        From: staticaddress(_at_)domain(_dot_)com
        Date: $dd $mmm $yyyy 00:00:01 +0000
        Content-Type: text/html
        Mime-Version: 1.0
        Status: RO
        
        EOF
done > yourmboxfile

Note that to really be correct you need to compute the day of the week
somehow, and insert that into the From_ and Date: lines:

        From staticaddress(_at_)domain(_dot_)com $DOW $mmm $dd 00:00:01 $yyyy
        ...
        Date: $DOW, $dd $mmm $yyyy 00:00:01 +0000
        ...

If you don't have a POSIX shell that can handle dd=${date%%-*} and so on,
you can play games with IFS like so:

        ifs="$IFS"
        IFS=-
        set $date
        IFS="$ifs"
        dd=$1
        mmm=$2
        yy=$3

(And similarly with IFS=: to trim the trailing colon off $filename.)


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>