procmail
[Top] [All Lists]

Re: Procmail experiments -- good methods

1999-05-16 18:15:18
era eriksson <era(_at_)iki(_dot_)fi> writes:

On 15 May 1999 05:25:22 -0700, Harry Putnam <reader(_at_)newsguy(_dot_)com>
wrote:
 > Original headers start immediately below the ^BEGIN line but are not
 > in unix format.  The "From " line and "Return-Path" lines are missing.

Are you sure you need them? Both? (Usually the contents of Return-Path
are derived at delivery from the From_ line.)

I see, it isn't necessary to have both.  I wanted the files to look
like normal mail.  I was under the wrong notion that the "Return-Path:" 
header was part of what makes apps like mutt see "mail" instead of a
text file.  Still just starting to catch on to what makes mail tick.

Actually I was under the impression that MH didn't use From_ lines at
all, but since I don't use MH myself, this should definitely be taken
with a grain of salt.

Probably doesn't.  I'm not actually using MH at all.  Just wanted the
data in the format <Directory>/1 2 3 4 etc.  Seems easier to search in
that format.  Where "grep -l" returns a file like `mail-ex/Jun/5' and
its just one message, instead of a  reference to a file that is
7MB long.

It also means I can access it from (Emacs) Gnus, with the "nneething"
backend and see the mnth directories as Newsgroups.  It works a lot
like "dired" in emacs only the files are messages.  (No write access
that way though... unless incorporated into gnus setup)

 > ##Remove stuff added by archive server
 > :0fhbwc
 > * Subject: archive retrieval: latest/[0-9]+
 > | sed  -e '1,/BEGIN------------cut here-------------/d' \
 >        -e '/END--------------cut here-------------/,$d' \
 > ##Replace unix message format ('^From ' and '^Return-Path: ') lines 
 > |awk 'BEGIN {"date"|getline;print "From 
apollo-list-request(_at_)redhat(_dot_)com", $0 "\nReturn-Path: 
<apollo-list-request(_at_)redhat(_dot_)com"};{print}'

:f and :c are usually not meaningful at the same time. Take out the c
flag is my advice.

Yes I see it defeats the filtering  now you point it out, and in fact
I'd removed it from my experimental files.  But apparently pasted to
that message before I got that far.

(The comments in the action part are of course not part of the real
live recipe you're using, correct?)

Hey, you finally hit on something I actuall knew... : )


Since the significance of the date on the From_ line is pretty much
zero here (if it isn't, how about you derive it from the Date: header
of the digested message [a very imprecise science] instead of the time
when you happened to process it?), you might as well fill it in with a
bogus date instead. That means you can use the same sed script for all
your processing. (Just put in something like Thu Jan 01 00:00:00 1970.
Or if you need a real date, how about produce all the headers you need
with date(1) -- something like

These are good clues... but your right in thinking the "From" line or
date are not meaningful here.  But since I'm trying to learn more
about procmail, I was acting as if it were.

Tell me more about deriving the date from the Date: header if you have
time.

    date -d "whatever" +'From apollo-list-request(_at_)redhat(_dot_)com  %c
Return-Path: <apollo-list-request(_at_)redhat(_dot_)com>'

Still another way..... good

... I also happened to notice you're missing a close bracket on the
Return-Path.)

Ahh .. fortunate that this is experimental eh?

Procmail isn't exactly blazingly fast, in my experience. Getting rid
of the extra awk script should help some, but probably not make a
world of a difference. How long does it take if you just deliver those
to /dev/null with no processing whatsoever?

About half... aparently the sloppy scipting was causing the slowness.
Your posts and the others in this thread have given me lots to
consider and play with.  But is taking plenty of time to soak in.. he he.

I finally noticed that this particular archive format doesn't even
need so much scripting at all.  And have went to a simpler .promailrc
with a sed script:

3,/BEGIN.*cut here/d
/END.*cut here/, $d 

This just removes everthing after the "From " and "Return-Path" that
the archive machine has added (including the first Blank line) and
lets those two lines be the first two on the original headers. So the
original headers plus the requisite blank line now are complete.

 1 From blab Sun May 16 18:00:49 PDT 1999
 2 Return-Path: blab
 3 archive
 4 machine
 5 headers
 6 [blank]
 7 misc file info
 8 BEGIN cut here ---
 9 begin Received lines from original message 
10  orig
11 headers
12 [blank]
body

So just snatch out 3 thru 8
I knew I was making this too complicated