procmail
[Top] [All Lists]

Re: Zapping repeated .sig appendices

1996-09-03 19:41:58
On Tue, 03 Sep 1996 09:08:36 -0700, "Alan Stebbens 
<stebbens(_at_)sgi(_dot_)com>"
<stebbens(_at_)anywhere(_dot_)engr(_dot_)sgi(_dot_)com> wrote:
On Sun, 1 Sep 1996, Peggy Wilkins wrote:
In my opinion, it's better to educate people to edit out unnecessary
text in their replies.  They should be cutting out the sigs before
they send their messages, in addition to cutting text that is not
directly relevant to their reply.  I am genuinely surprised that
people are not taught to do this.
i certainly agree that it would be far more courteous for people to do
what you describe, but given the difficulty of breaking millions of people
of (what we consider to be) their bad habits, why not try to solve the
problem another way: with procmail?
<... horrendous .sig trimmed ...>
Your signature is an excellent example of why it would be very difficult
to automatically remove signatures from incoming e-mail.  Please define
a "rule" by which a program can determine that your e-mail message ends,
and your signature starts.  Please do not make the rule specific to
*your* signature, but make it general, so it can be applied to my
signature, and all of your other correspondant's signatures.

Here's something I've been planning to implement, in pseudo code: 

 1. Start from end of message

 2. If current line matches /^-- $/, snip from here to end of message
    and stop.

 3. If the current line doesn't match something that reasonably looks
    like natural language text (caution: don't trim just because it's
    indented or contains "foreign" characters or a lot of punctuation
    -- the author could be #&/%*f! swearing or providing a URL, or
    citing a Japanese poem in the original language in shift-JIS
    kanji), it's subject to snipping. Remember this.
    This could probably benefit from scoring. Lots of spaces in the
    middle of a line are a good indication, as well as lots of dashes
    or indeed any punctuation repeated a lot. You could probably have
    a pretty hight score for anything that matches an "ASCII art"
    recipe and a lower score for something that looks pretty much like
    ordinary text. 
    The basic idea of this "mark and ponder" algorithm is to leave
    some tolerance for stuff like "line of dashes, two lines of prose,
    line of dashes" types of signatures. If you collect more than,
    say, four lines of human-readable text, give up and go back to the
    last place that had a higher score and snip from there to the end
    of the message.

 4. Else, if it's an empty line, mark it for snipping, too

 5. If a line was just marked, look at the previous line now. Start
    over at 2. (Should probably give up anyway at around halfway up
    the message, but then that would miss the "<aol>Me too!</aol>
    [fifty lines of .sig]" messages. Hopefully those will never make
    it to my mailbox and/or newsreader in the first place ;-)

 6. Else, this is the last text line. Snip any marked lines below this
    one. Stop. 

I'm afraid my own bastardly .sig might defy most of the above ideas ...
But then it contains an IMPORTANT message. <g>

/* era */

-- 
See <http://www.ling.helsinki.fi/~reriksso/> for mantra, disclaimer, etc.
* If you enjoy getting spam, I'd appreciate it if you'd register yourself
  at the following URL:  <http://www.ling.helsinki.fi/~reriksso/spam.html>

<Prev in Thread] Current Thread [Next in Thread>