procmail
[Top] [All Lists]

Re: Header insertion from message body...

1999-12-03 18:36:40
Nathan Edwards wrote,

| David Tomkin suggested ...

... that you spell my name "Tamkin" the way the INS did when my great-grand-
uncle came through Ellis Island.  Thanks.  Now, here we go.

|   :0 c
|   HEADEREND=| /usr/bin/sed -n -e '1,/^$/{/^$/=;}' -e '/^$/q'

OK, first, `c' is redundant on a variable capture recipe.  Second, since
you are exiting sed at the first blank line, you don't need the range on the
first sed instruction.  Third, since you are not reading the entire input,
you need the `i' flag.  So so far we have this:

   :0i # better to set PATH than to keep using absolute paths, I feel
   HEADEREND=| sed -e /./d -e = -eq

Oh, heck, why don't we just get the number of the last line in the head?

   :0h
   HEADEREND=| sed -n \$=

|   #   And add one...
|   BODYSTART=`expr $HEADEREND + 1`

(Nathan called expr by its basename with no absolute path; why not sed?)  We
don't need expr, nor $HEADEREND then:

  :0h
  * $ `sed -n '$='`^0
  * 1^0
  { BODYSTART = $= }

or, since a quirk in procmail makes it count one too many when you count
whole lines in a search area that ends with a newline,

  :0 # H without B is the default
  * 1^1 ^.*$
  { BODYSTART = $= } # We don't even need sed.

but guess what: we will end up dropping all of that, because we won't need
$BODYSTART either.

Let's move on to the recursively called INCLUDERC:

| The file redir.loop:
| 
| # This is just a variable assignment, so don't count this one as
| # delivery. b flag pipes only the body, the sed command picks out
| # just the first line and stores it in $FIELD.
| :0 bc
| FIELD=|/usr/bin/sed '1 q'

Again, no `c' on a variable capture recipe, and there's no need for sed:

  :0B
  * ^^\/.*
  { FIELD="$MATCH } # quotes not strictly necessary, but they make me feel safe

| # If $FIELD looks like a header, then...
| :0
| * FIELD ?? ^[-a-zA-Z0-9]+:

Then we don't need $FIELD at all; we can just change the flag line and the
condition to these:

  :0B # without D, no need for "a-zA-Z"
  * ^^[-a-z0-9]+:
  
| {
|         # Remove the first line of the body...
|         :0 f
|         | /usr/bin/sed "$BODYSTART d"

If the goal is to remove the first line of the body, you don't need to pipe
the head as well.  Leave the head alone:

          :0bf
          | tail +2 # tail, being simpler than sed, is cheaper to fork

|         # Insert the header with formail. Quote to aviod problems
|         # with special charaters. Use of -i rather than -I ensures that
|         # number of lines in the header goes up by one, whether the
|         # field was present previously or not.
|         :0 f
|         | formail -i "$FIELD"

Ah, so what you're really trying to do is to move those old headers that now
are at the beginning of the body up to the head ...

|         # The body now starts one line further down...
|         BODYSTART=`expr $BODYSTART + 1`

Again, procmail's internal arithmetic is cheaper than expr for things that it
can do, and we don't need $BODYSTART anyway.

|         # Iterate...
|         INCLUDERC=redir.loop

Better,

          INCLUDERC = $_

in case you decide later to change the name of the file; then you will have
to edit it only in one place.

| }

So all told, redir.loop serves to exchange the first blank line with the
line after it if the line after it looks like a header:

  :0Bfw # search body, filter whole
  * ^^[-a-z0-9]+:
  | sed -e '/^$/,$ !b' -e /./b -e N -e 's/\(\n\)\(.*\)/\2\1/'

  :0A
  { INCLUDERC = $_ }

It still will fail if any of the old headers moved to the body have continu-
ation lines.

| However, I'm still left with my original challenge.  It seems like it
| should be possible to do this without iteration.

Actually, yes, and this also takes care of continuation lines in the old
headers properly:

 :0Bfhw # search body, filter head
 * ^^[-a-z0-9]+:
 | formail -X "" # remove blank line at neck

 :0afwh # then in case of duplicate headers, keep last occurrence of each
 | formail -U ""

The reasons that the second recipe is unconditional and has no test for two
or more headers with the same name are these:

1. without backreferences, it would be a very hairy regexp to test for every
possible header name that might appear twice;

2. in case the original headers were pressed against the original body with
no blank line in between, and thus there was no neck now that the two sets of
heades had been concatenated, the second formail call, having neither -f nor
-x nor -X, would restore it.

<Prev in Thread] Current Thread [Next in Thread>