procmail
[Top] [All Lists]

Re: Removing line-wrapped header

2004-09-09 16:34:42
On Fri, Sep 10, 2004 at 12:11:20AM +1200, Volker Kuhlmann wrote:
  :0
  * $ ^^\/(.+$)+$\KNOWN_HFIELD:(.*\<)?$\KNOWN_STRING\>
  * MATCH ?? ^^\/(.+$)+.
  * MATCH ?? ^^\/(.*$)+..+$
  { H_TOP = "$MATCH" }

Uhm, I don't understand all this, but never mind. KNOWN_STRING should be
able to be a regex, which means leaving out the $\ and the \<\>. That's
only details though.

Yes, well, of course you can put what you want in that part.


  | echo "$H_TOP$H_BTM" | sed "s/$TAB/\\$NL&/g"

I believe RFC says wrapping is by at least one whitespace at the
beginning of the line, so you can't rely on the tab. Is the gawk
solution I just posted slower than these echo/sed? (Not that it really
matters.)

Yes, I realized that about the tabs, but I have two observations.
First, it's usually tabs, and the few messages I tested it on all looked
kosher when I was finished with the run.  Second, suppose it were
space: do you really care whether that line doesn't get re-broken?
I mean, you probably wouldn't even notice it.  If it's a whole bunch
of spaces, well, okay, but we could put that in the sed expression too.
Moreover, we could limit the sed expression to the particular
$KNOWN_HEADER to eliminate fallout with other headers that might
have tabs or spaces not at breakpoints.

Finally, well, if you're going to be *that* compulsive about it needing
to look *exactly* like it did before except for the one header now excized,
well, use the perl solution or your awk.

One could also run the message through sed *first* and mark all line
breaks with something known, such as a tilde.  Then look for that in
the re-split at the end.  But now we've got two sed calls and an echo,
so that's no better than one perl or awk, I suppose.

As for what that stuff up-top does, it took me a while to get it right,
I assure you.  :-)  Basically, the first condition matches from the
top of the message (header set only) through "$KNOWN_STRING\>".
Procmail sticks a virtual "$" on the end of things, so the next two
lines are a kludge to trick our way back out of that and get rid of
the bottom line (with $KNOWN_STRING) of what we matched.  We are
re-instantiating the MATCH with ever-smaller strings we control until
we get rid of that last line altogether.  This is necessary because
procmail's MATCH token only goes forward, not backward (left).
(Well, the procmail I use has left-matching!  But I don't use it,
because I want to write my stuff to be compatible with general procmail
binaries.)  In the second condition,
we take

   ^^................$
     ..............$
     $KNOWN_HEADER....$KNOWN_STRING\>

and turn it into

   ^^................$
     ..............$
     .

Now it's a cinch to get rid of the one character on the
last line; we save again (re-instantiate) only through
the last line with *two* chars plus ".*" to the end of the
line.

-- 
dman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>