procmail
[Top] [All Lists]

Re: Procmail experiments -- good methods, continued

1999-05-17 11:46:36
Harry Putnam continued,

| David your posts are giving me plenty to work with.  Not that
| conversant with sed or awk so am stumbling around with them.
| Trying to follow along though.

Glad to help.  Good luck.

| One thing I haven't found a reference to is the ` -ep' in your script.
| What is its function?

It means -e p -- in other words, -e to introduce another sed instruction
as a separate argument, and p as the sed instruction to print the pattern
space to standard output.

| >         -e '/END--------------cut here-------------/q' -ep \
| 
| I've now moved to a simpler sed script  and dropped awk altogether.  I
| was overcomplicating things quite a bit.  The latest .procmailrc looks
| like this:

|  :0fhbw
|  * ^From apollo-list-request@
|  * ^Subject: archive retrieval: latest/[0-9]+
| | sed -f ~/projects/awk-sed/sedsc/sedsc-redhatlists

Again, `hb' is the default, so you can drop both and use :0fw alone, or
actually :0fwi as I'll explain later.

| The sed script called here looks like:
| 3,/BEGIN.*cut here/d
| /END.*cut here/, $d

Why from line 3?  What is worth keeping on line 2?

| Solves the header stuff with less fanfare.. Not sure what the $d does
| but it doesn't work with out it.

It isn't "$d" but rather /END.*cut here/,$  d  -- from a line containing
"END[anything or nothing]cut here" to the last line of the input, delete all
lines.  Inside a script file you can safely put the dollar sign next to the
d, but don't think of them that way.  The dollar sign is more closely associ-
ated with the comma and the regexp before it.

In any case, deleting from a line in the file all the way to the end is usu-
ally an inefficent way to use sed.  If there's something that tells you you
want to drop the rest of the input, you use sed's q command and don't waste
the cycles to read and process any more.  (Of course, within a procmailrc
that means adding the `i' flag to tell procmail that it's OK for the program
in the action line to exit before reading the entire input.)  So if you want
to stop copying lines at "foo" and blow off the rest of the file,

  sed /foo/q # no -e necessary when all instructions are in one shell parameter

It's slightly trickier if /foo/ isn't the last line to print but rather the
first line NOT to print.  Then you do this:

  sed -n -e /foo/q -e p

which can also be written,

  sed -ne /foo/q -ep

or in many versions of sed,

  sed -n '/foo/q;p'

While   sed '/foo/,$ d'  would produce the same results, it would need to
spend time and cycles reading all the way to the end of the input, dropping
lines one at a time, instead of taking the fast way out when its job is done.
Just remember to add the `i' flag.

| The rest is just sending to mnth folders.  However I'm having a bit of
| trouble seeing how using the 'a' is different than enclosing in block
| {}. The man  pages seem to be saying they do the same thing.

`a' means "if the conditions matched on the last recipe that has neither an
`A' flag nor an `a' flag and if also the last attempted action succeeded,"
though Philip could describe it better.

|    :0a
|    * ^Date:.*Nov
|    /home/reader/projects/proc/apollo/Nov/.
| 
|    :0a
|    * ^Date:.*Dec
|    /home/reader/projects/proc/apollo/Dec/.

What, twelve recipes for that?  Goodness, no, one will do:

     :0a
     * ^Date:.*\/(Jan|Feb|Ma[ry]|Apr|Ju[nl]|Aug|Sep|Oct|Nov|Dec)
     /home/reader/projects/proc/apollo/$MATCH/.

though I hope you're not trying to apply it after the filter we were
discussing above, because there will be no Date: header left after the
filtering.

| One minor aspect of all this is that while processing the archive
| messages I sometimes get a phantom  "From foo(_at_)bar" showing up in the
| experimental DEFAULT mail box, that has no other lines.

I think sometimes the script is getting null input.  A verbose logfile would
help uncover the cause.