procmail
[Top] [All Lists]

Re: subject cleaning

1999-05-19 02:34:37
On 18 May 1999, Tony Lam <Tony(_dot_)Lam+procmail(_at_)Eng(_dot_)Sun(_dot_)Com> 
wrote:
"L" == Liviu Daia <Liviu(_dot_)Daia(_at_)imar(_dot_)ro> writes:

L> On 17 May 1999, Tony Lam 
<Tony(_dot_)Lam+procmail(_at_)Eng(_dot_)Sun(_dot_)Com> wrote:

I'd like to do some subject cleaning with procmail so that:

Subject:  [Fwd: Re: misc stuff]            => Re: misc stuff (fwd)
Subject:  Re: [Fwd: Re: misc stuff] (fwd)  => Re: misc stuff (fwd)
Subject:  [Fwd: [Fwd: Re: [ace-users] misc stuff] ]        => Re: 
[ace-users] misc stuff (fwd)

I have the following recipe that only handles the first case:
L> [...]
I need help to get it work with case 2 and 3? Note the [Fwd: .* ]
can be multi-level nested. It doesn't look easy for me.

L>     It isn't.  As usual, the problem is there's no way to tell
L> procmail to stop adding characters to $MATCH when it hits a certain
L> pattern.  That's why your best bet is to do the whole thing with
L> sed:

OK, expansive, but I don't really mind as long as it does the job.

    For the reason I stated above, you can't do it with procmail alone.
There's no reason to spawn two copies of sed though.

L> :0
L> * ^Subject:[       ]+\/.*\[Fwd:.*
L> {
L>   subj = `echo "$MATCH" | sed \
L>    -e 's/^\(\(\[Fwd\|Re\):[        ]*\)*/Re: /' \
L>    -e 's/\(\][     ]*\|[   ]*(fwd)\)*$/ (fwd)/'`

L>   :0 fwh
formail -I "Subject: $subj"
L> }

This does not quite work for me.

    Let's see.  It doesn't work because, unlike your first example, in
real life the "]" can be preceded by a space (or "(fwd)" can be followed
by a space).  Ok, try this:

echo "[Fwd: [Fwd: Re: [ace-users] misc.jet] (fwd) ] (fwd)" | sed \
        -e 's/^\(\(\[Fwd\|Re\):[        ]*\)*//' \
        -e 's/\([       ]*\][   ]*\|[   ]*(fwd)[        ]*\)*$//'

--- which translates to the recipe

:0
* ^Subject:[    ]+\/.*\[Fwd:.*
{
  subj = `echo "$MATCH" | sed \
        -e 's/^\(\(\[Fwd\|Re\):[        ]*\)*/Re: /' \
        -e 's/\([       ]*\][   ]*\|[   ]*(fwd)[        ]*\)*$/ (fwd)/'`

  :0 fwh
  formail -I "Subject: $subj"
}

Here is what I came up with along the line of your idea:

echo "[Fwd: [Fwd: Re: [ace-users] misc.jet] (fwd) ] (fwd)" | \
sed -e 's/\[Fwd: //g' | \
sed -e 's/\(\][       ]*(fwd)\)*[     ]*\][   ]*\((fwd)?\)$//g'
                               ^                       ^
                               |                       |   
                               |                       |

This still doesn't work when I have the above '*' and '?',

    Of course it doesn't, you are hard-coding an association between "]"
and "(fwd)".

which seem necessary to me for optional match. But it works if I take
them out. I'm not a regex/sed expert, any suggestion?

    Well, I'm not a regex/sed expert either...

Thanks.

    HTH.

    Regards,

    Liviu Daia

-- 
Dr. Liviu Daia               e-mail:   Liviu(_dot_)Daia(_at_)imar(_dot_)ro
Institute of Mathematics     web page: http://www.imar.ro/~daia
of the Romanian Academy      PGP key:  http://www.imar.ro/~daia/daia.asc

<Prev in Thread] Current Thread [Next in Thread>