procmail
[Top] [All Lists]

Re: line-by-line processing (was: Filtering out unwanted mime attachments)

1997-10-20 13:43:18
"Edward J. Sabol" <sabol(_at_)alderaan(_dot_)gsfc(_dot_)nasa(_dot_)gov> writes:
OK, I'm starting to this think is a bug with procmail. Can someone confirm
before I bring this to Stephen's attention?

"That's not a bug, it's a feature"  (but not yet)

Procmail explicitly strip a leading newline from whatever is matched.


"Brother, read from the Book of Armaments!"
"Book of Armaments, chapter src/regexp.c, verses 601 and 602:"

           if(*bom=='\n')
              bom++;                            /* strip one leading newline */


This is so that a match that starts with a ^ doesn't include the
newline, ala the condition:

        * SOMEVAR ?? blah blah blah \/^(whatever|something)


With enough effort, however, you can work around it.  Don't barf:


        :0
        * BODYLINES ?? ^^.*$\/(.*$)+
        { REMAININGLINES = $MATCH }

        :0E
        { REMAININGLINES }

        :0
        * BODYLINES ?? ^^\/.*$
        { THISLINE = $MATCH }

        :0E
        { THISLINE }

        # Okay, make sure THISLINE didn't lose a leading newline
        :0
        * ! BODYLINES ?? $ ^^$\THISLINE
        { THISLINE = "
        $THISLINE" }

        # Now check REMAININGLINES
        :0
        * ! BODYLINES ?? $ ^^$\THISLINE($)$\REMAININGLINES^^
        { REMAININGLINES = "
        $REMAININGLINES" }


        LOG="REMAININGLINES = |$REMAININGLINES|
        THISLINE = |$THISLINE|
        "



HOWEVER... this will exhibit a latent procmail bug, where when procmail
does the stripping (with those two lines of C shown above) it fails to
decrease the length of the match, so that when the first line of what
is being matched is empty (i.e., what is being matched starts with a
newline), then you get the first character of the _second_ line in the
match.  Here's the patch:


*** src/regexp.c        1997/04/04 07:28:42     1.1.1.2
--- src/regexp.c        1997/10/20 19:53:00
***************
*** 598,605 ****
           tmemmove(q=(char*)text,bom,len),q[len]='\0',bom=q;
        else
         { char*p;
!          if(*bom=='\n')
!             bom++;                            /* strip one leading newline */
           primeStdout(amatch);p=realloc(Stdout,(Stdfilled+=len)+1);
           tmemmove(q=p+Stdfilled-(int)len,bom,len);retbStdout(p);
         }
--- 598,605 ----
           tmemmove(q=(char*)text,bom,len),q[len]='\0',bom=q;
        else
         { char*p;
!          if(*bom=='\n'&&len)
!           { bom++;len--;}                     /* strip one leading newline */
           primeStdout(amatch);p=realloc(Stdout,(Stdfilled+=len)+1);
           tmemmove(q=p+Stdfilled-(int)len,bom,len);retbStdout(p);
         }


Once you apply that and recompile, then the workaround will actually do
so.

I've sent the above patch off to Stephen, along with a request that the
"strip one leading newline from $MATCH" behaviour be documented, or
fixed to only apply to ^ and not $.  Once it's documented, of course,
then the first line of my reply will actually be true.


Philip Guenther