Re: subject processing

On Fri, 4 Jul 97 11:31 EDT, process(_at_)qz(_dot_)little-neck(_dot_)ny(_dot_)us 
(Eli the Bearded) wrote:

era eriksson <era(_at_)iki(_dot_)fi> wrote:

:0fhw
* ^\/Subject:[      ]*((re|fw|sv|betr|antw)( ?[[][0-9]+])?[-:>][    ]*)+
| ... some suitable program which does essentially \
  sed "s/^$MATCH/Subject: /"

That is not quite what he wanted because he wished ones with an Re:
in the subject to continue to have an Re: in the subject. And sed


I was wondering about that actually -- must have missed it in the
original message.

will not find a match when trying "^Subject: Re[4]: Re: " against
"Subject: Re[4]: Re: ". [] are metacharacters remember.


That's what the explanation you didn't quote was trying to explain.
(That's why it's pseudo code and not a real script.)

For the record, here's a different kind of sed snippet which should work: 

  :0fh
  * ^Subject:\/([       ]*(re|fw|sv|antw|betr)( [[][0-9]+])?[->:])+
  * ! MATCH ?? ^^ Re:^^
  | sed -e '/^Subject:/!b' -e ':loop' \
      -e 's/^Subject:[  ]*[Rr][Ee][     ]*[[][0-9][0-9]*][-:>][  ]*/Subject: /' 
\
           -etloop \
      -e 's/^Subject:[  ]*[Rr][Ee][-:>][        ]*/Subject: /' -etloop \
      -e 's/^Subject:[  ]*[Ff][Ww][-:>][        ]*/Subject: /' -etloop \
      -e 's/^Subject:[  ]*[Ss][Vv][-:>][        ]*/Subject: /' -etloop \
      -e 's/^Subject:[  ]*[Aa]ntw[-:>][         ]*/Subject: /' -etloop \
      -e 's/^Subject:[  ]*[Bb]etr[-:>][         ]*/Subject: /' -etloop \
      -e 's/^Subject: /Subject: Re: /'

If you want my nomination for a program with a disappointing regexp
implementation, that would be sed. (One of the foremost reasons I've
never really learned sed is that you can accomplish the same things
with Perl without all the silly dead ends sed has.)
  (You should properly also do [Ss][Uu][Bb][Jj][Ee][Cc][Tt] instead of
just Subject in all places on the left hand side. I also cheated with
the longer strings [Aa]ntw and [Bb]etr. "Betr" is one I added myself
anyway. :-) And if you see Fw [2]: and Betr [2]: etc, too, those will
have to have their own lines as well.)

I believe this will have a significantly smaller footprint than Perl,
and accomplish the same task. (Allegedly, Perl is often faster than
sed even on sed's home turf, but for a quickie such as this, the
loading time might be more significant than the actual running time
for the script.)

In practice, I would still probably try my original suggestion and a
simplistic script of some sort to do the substitution for $MATCH.
(After all, there are only the occasional left brackets to escape [the
right brackets are not magical by themselves], so it would be pretty
easy to write another sed script to add backslashes before them. That
would obviate the need for this [UPPERlower] nonsense, too: there's
only a literal string to substitute.)

Perhaps I'm unduly paranoid, but if the script is potentially going to
handle thousands of messages a day, it's worth a bit of effort to find
the most economical solution. (I imagine many scripts presented on
this list end up in at least a few .procmailrc:s other than the
original author's sooner or later.)

With dynamic loading it doesn't seem that bad. On my machine:

     SIZE    RSS     COMMAND         (version)
      984    372     procmail        3.10
     1488    592     perl            5.004
     1644    768     procmail        3.11pre7 with perlembed


How would sed score here? 

/* era */

who has finally snored past the boring first chapters of the O'Reilly
sed && awk book :-)

-- 
Defin-i-t-e-ly. Sep-a-r-a-te. Gram-m-a-r.  <http://www.iki.fi/~era/>
 * Enjoy receiving spam? Register at <http://www.iki.fi/~era/spam.html>