procmail
[Top] [All Lists]

RE: how to strip SA's message markup?

2004-02-22 16:01:40



From: Bart Schaefer
Sent: Sunday, February 22, 2004 1:48 PM
[...]

On Sun, 22 Feb 2004, Gary Funck wrote:

  formail -s spamassassin -d < old > new

% time formail -s procmail ~/remove_sa_markup.rc < spam-100-msgs.mbox >
spam-100-clean.mbox
User=0.660 System=0.330 Wall=0:01.09 (U+S)/W=90.8%

That's about 75x faster for procmail.

Yes, but your procmail version doesn't handle as many possible variants
of SA markup as "spamassassin -d" would.


Agreed. I'm looking into that a bit to see if I can't get the result
to be closer in matching up with SA.  I'm not sure that I'll go to
effort to read SA's conf. file, to understand subject rewrite tags,
but maybe.

I'd also be curious -- grab PPerl from CPAN, install it, and try
formail -s pperl `which spamassassin` -d < old > new

Good idea. Here's the results:

% time formail -s pperl `which spamassassin` -d < spam-100-msgs.mbox >
spam-clean.mbox
User=0.700 System=0.200 Wall=0:43.84 (U+S)/W=2.0%

[the cpu time is not meaningful because of the nature of pperl's
implementation]

That's about 50% less wall clock time, or an 1.88x speed improvement. BTW,
this is
running the latest svn (development) version of SA, which has some
improvements
in the markup removal if I recall correctly (or maybe they were planned but
not
implemented, don't recall).




_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail