procmail
[Top] [All Lists]

Formail and the -f flag

2001-12-11 02:50:29
I understood the -f flag to formail would prevent it from adding the
mbox style "From " line to massages but it appears this is not
happening here.

I'm reading an nntp incoming data stream with this command line:

$BASE/run_suck \
 |formail -m5 -f -d -e -s procmail -m $BASE/.proc_nntp_split_test

(run_suck is a small wrapper I use to run `suck'.  Most of the
messages coming thru (but not all) begin with the nntp `Path: '
header.  I understood the -d flag would allow formail to handle them
and it does, at least in part.

After some heavy doses of help from Martin McCarthy, I have worked out
a system that seems like it should have a chance of working.

I wanted to leave the messages as they come in except for some edits,
hence the -f flag.  But I still see the `From ' line being appended to
them.

If I run suck without piping thru formail/procmail, no `From ' line
appears.

Oddly, I see that in the control file described below the `From ' line
is not prepended but in any of the actual `delivery' files, the mbox
'From ' line is there.

The formail man page clearly says this should not happen:
 -f
   Force formail to simply pass along any non-mailbox format (i.e.,
   don't generate a `From ' line as the first line).

So there must be something else wrong with my command line.  Or
possibly .procmailrc. I don't really think it should involve
.procmailrc but I'm not that conversant with how this all works.

The actual .procmailrc in use with this setup is included below at the
end.  I would appreciate any pointers or comments about its poor
design etc, but first a brief sketch of what it is supposed to do:

The overall aim here is to read the incoming nntp data and convert it
to mbox style files. Later processing is performed to generate mail
groups that mirror newsgroups... none of that is involved in this
presentation)  

Some of the recipe is just there to compile some diagnostic info until
I get it cleaned up. So taking the rules one at a time:

1) This rule is supposed to compile a control file that will hold the
   data unaltered.  And compile a list of three headers associated
   with each incoming file (Path Msg-id Xref) in a separate file 
   (mgid_file)  (all for diagnostics only)

 :0c :
 |$AWK 'FNR==1{headers=1} \
      headers {print >> "control.in"} \
      headers && /^Path: / {print $0 >> "mgid_file"} \
      headers && /^Message-[Ii][Dd]: / {print $0 >> "mgid_file"} \
      headers && /^Xref: / {print  $0 "\n-- " >> "mgid_file"} \
      /^$/ {headers=0}'
(NOTE: The mbox `From ' line is absent in `control.in' created here)

2) Just a filter rule to grab the Xref header and rewrite it as:
   `X-Save-Xref: $MATCH' and remove the dot on a line by itself
   present in nntp data.

    :0fW
   * ^Xref:( |\t)*..\/.*
    | formail -I "X-Save-Xref: ${MATCH}" \
    |sed '/^\.$/d'
 
3) I think some messages come thru without Path headers or maybe my
   setup is splitting incoming data where it shouldn't be split.  This
   rule is to help diagnose that.

    :0
   * ! ^Path:
    1x.NO_PATH.in

4) I've also suspected some come thru with no Newsgroups header.  This
   is supposed to trap them.

    :0
   * ! ^Newsgroups: 
    1x.NO_NEWSGROUPS.in

5) This rule is the workhorse that generates files based on data in
   the Newsgroups field.  It is a massive long regex so here reduced
   to (.. massive list here ..) to represent some 20+ newsgroups in
   regex format.  The actual regex is included with the actual
   procmailrc at the end.

    :0
    * ^Newsgroups:(.*,)?[ \t]*\/(..massive list here..)
    {
       DELIVERY=$MATCH
 
      :0 :
      1x.${DELIVERY}.in
    }

6) Finally a little catchall rule for stuff I forgot.

    :0
    1x.misc_news.in

========================================

Actual procmailrc 

PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin
SHELL=/bin/bash
LOGABSTRACT=ALL
ORGMAIL=$BASE/$LOGNAME
DEFAULT=$ORGMAIL
VERBOSE=YES
BASE=$HOME/projects/proc/suck_test2
LOGFILE=$BASE/.log
LOG="`date +'%b %d %T %w '`
"
MAILDIR=$BASE/spool_test2
TRAP="formail -XMessage-Id:"
AWK=/bin/awk

## Create a control file with data as formail left it, and a separate
## file of three headers associated with each file (Path mgid Xref)
 :0c :
 |$AWK 'FNR==1{headers=1} \
      headers {print >> "control.in"} \
      headers && /^Path: / {print $0 >> "mgid_file"} \
      headers && /^Message-[Ii][Dd]: / {print $0 >> "mgid_file"} \
      headers && /^Xref: / {print  $0 "\n-- " >> "mgid_file"} \
      /^$/ {headers=0}'

## Snag Xref header and include it under X-Save-Xref
## Remove any lines with a dot by itself on the line
    :0fW
   * ^Xref:( |\t)*..\/.*
    | formail -I "X-Save-Xref: ${MATCH}" \
    |sed '/^\.$/d'
 
## catch any messages with no Path header
    :0
   * ! ^Path:
    1x.NO_PATH.in

## catch any messages with no Newsgroups header
    :0
   * ! ^Newsgroups: 
    1x.NO_NEWSGROUPS.in

## Match something from Newsgroups field to generate file name
    :0
    * ^Newsgroups:(.*,)?[ 
\t]*\/(alt\.test\.yer\.posts|alt\.solaris\.x86|comp\.editors|comp\.emacs|comp\.lang\.awk|comp\.lang\.perl\.moderated|comp\.mail\.sendmail|comp\.os\.linux\.security|comp\.security\.ssh|comp\.unix\.questions|comp\.unix\.shell|comp\.os\.linux\.networking|comp\.software\.config-mgmt|gnu\.cvs\.help|gnu\.emacs\.gnus|gnu\.emacs\.help|mailing\.freebsd\.net|mailing\.freebsd\.security|mailing\.freebsd\.questions|news\.software\.nn|news\.software\.nntp|newsguy\.general|newsguy.test|redhat\.networking\.general)
    {
       DELIVERY=$MATCH
 
      :0 :
      1x.${DELIVERY}.in
    }

## catch anything left over
    :0
    1x.misc_news.in
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>