At 14:00 2002-07-07 -0700, GreenTree Ground Station did say:
here might have info on some kind of ratio calculator
that might work in conjunction with procmail to check
We discussed this a few months back when I was developing a number of
rulesets for a mailing list preprocessor. The following code is basically
what I'm using now. It supports list-specific thresholds - which I get
from a line extracted from a file, but you could mimic for a standalone
filter like so:
# as extracted, expects a listname at the beginning, so all options are
# expected to have a leadig space for delimiter
FILTER_OPTIONS=" BLOATOK BLOAT_IN=120"
This filter uses a few different quoting marks [:|>], and doesn't assume
any special quote format (some braindead MUAs - or their users - do things
like putting a bracket at the beginning of a quoted section, and another at
the end:
> several
lines of text
terminated with <
Good luck making any sense of that BS when so many people don't even do
that consistently (their OWN text sometimes appears in response WITHIN the
content they quote in that fashion).
The ruleset:
# First, determine if we should be COPYING bloated messages or not
# (that is, are these just warnings?). Not only defines if a copy
# is processed, but also affected the advisory message sent.
:0
* $ FILTER_OPTIONS ?? [ ]BLOATOK\>
{
BLOATCOPY=c
}
#[snip - other optional detection method using a list-added footer banner]
# Define the filter ID
FILTER_ID="BLOATQUOTE"
# The decidedly more involved method - the conditions were pulled from
# <http://pm-doc.sourceforge.net/pm-tips-body.html#195>
# "14.3 Excessive quoting of message"
# X-Loop must match what is being used elsewhere
:0E
* $ FILTER_OPTIONS ?? [ ]$FILTER_ID\>
* ! ^X-Loop:[ ]+$LOOPALERT
* ! ^FROM_MAILER
{
# - quoted lines
# - non-blank, non-quoted lines
# - completely blank lines
# Establish defaults for all lists (used unless overridden)
# INitial credit (-)
BLOAT_IN=80
# QUote cost (+)
BLOAT_QU=10
# NeW line credit (-)
BLOAT_NW=14
# BLank line credit (-)
BLOAT_BL=5
# locate option values for scoring
:0
* OPTIONS ?? [ ]BLOAT_IN=\/[0-9]+
{
BLOAT_IN=$MATCH
}
:0
* OPTIONS ?? [ ]BLOAT_QU=\/[0-9]+
{
BLOAT_QU=$MATCH
}
:0
* OPTIONS ?? [ ]BLOAT_NW=\/[0-9]+
{
BLOAT_NW=$MATCH
}
:0
* OPTIONS ?? [ ]BLOAT_BL=\/[0-9]+
{
BLOAT_BL=$MATCH
}
# start with a zero extra score
addscore=0
:0
* ^X-Mailer:[ ]*Microsoft Outlook
{
# Compute Outlook adjustments
:0
* $ $BLOAT_QU^0
* $ $BLOAT_NW^0
{
BLOAT_OL=$=
}
VERBOSE=ON
# little extra check - MS uses non-conventional
# way of quoting -- the buggers include most of the header
# of the original message...
# the count against values used here are equal to the value
# a quoted header would NORMALLY have, *PLUS* the value that
# the next rule will be granting these same lines because it
# thinks they're NOT quoted lines. This still doesn't
# accomodate the extra blanks in there, but by and large
# should deal with the overall quoting scheme anyway. Only
# triggers if the initial "original message" line is found.
:0B
* ^[ ]*----- Original Message -----
* $ $BLOAT_OL^0
* $ $BLOAT_OL^1 ^[ ]*(From|To|Sent|Subject):
{
# note the score, so we can add it in the generic
# check
addscore=$=
}
}
# see the details of the logic - then turn this off when you
# understand it
VERBOSE=ON
# initial line between header and body is counted as one of the
# blank lines when issuing credits.. [snip] lines and the sort
# are as well..
:0B$BLOATCOPY
* $ -$BLOAT_IN^0
* $ $BLOAT_QU^1 ^[ ]*[>|:]
* $ -$BLOAT_NW^1 ^[ ]*[^>|: ]
* $ -$BLOAT_BL^1 ^[ ]*$
* $ $addscore^0
{
### Follows is diagnostic stats - you can omit.
BOUNCENOTES="Scored"
# note the final (positive) score, so
# we can add it to the advisory header
BOUNCENOTES="$BOUNCENOTES ($= weight)"
# Okay, we know we're going to bounce, but let's just
# re-compute the values for each line (sadly, we can't
# store results from each condition above)
VERBOSE=OFF
:0
* ^X-Mailer:[ ]*Microsoft Outlook
{
:0B
* ^[ ]*----- Original Message -----
* 1^0
* 1^1 ^[ ]*(From|To|Sent|Subject):
{
addscore=$=
}
}
:0B
* 1^1 ^[ ]*[>|:]
* $ $addscore^0
{
BOUNCENOTES="$BOUNCENOTES ($= quoted)"
}
:0B
* 1^1 ^[ ]*[^>|: ]
{
BOUNCENOTES="$BOUNCENOTES ($= new)"
}
:0B
* 1^1 ^[ ]*$
{
BOUNCENOTES="$BOUNCENOTES ($= blank)"
}
LOG="BOUNCENOTES: $BOUNCENOTES$NL"
### End diagnostic stats.
# choose an advisory text. Note that if you want two
# different messages - one for those which are being
# permitted ("warning") and those which are refused, you
# can use the BLOATCOPY flag ('c') to differentiate the
# two files.
BOUNCEMSG=bloat${BLOATCOPY}.msg
BOUNCESUBJ="Excessive message quoting"
# Include some generic bounce handler code
# (or you'd cram your own handler right here)
INCLUDERC=bouncer.rc
}
VERBOSE=OFF
}
the percentage of 'quoted content' in an email post
and act accordingly.
The above doesn't weigh the byte count of the lines, just the raw lines
themselves, assigning different weights to different type of lines, as per
logic for a mailing list.
Mainly for the first couple times it finds 'excessive quoting',
If you want to add count logic, you're free to do that. I'm planning on
doing that with the above -- automatically shifting from warnings to "we've
explained it several times, but since you haven't clued in, here's some
incentive..."
PS: I've searched the net, but didn't see anything
relating along these lines.
I'd start with the procmail list archives and it's FAQ. The basic logic of
the above is from the PM Tips document (URL included in the code).
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail