procmail
[Top] [All Lists]

eGroups ads

2000-08-30 21:41:07
Ok, here's what happens when someone crosses the border from annoying to
obnoxious. Park this on your hard disk. Oh and yeah, don't tell egroups...

Volker


##### strip_egroups_ad.rc
#
# Resource file for procmail. Run with:
#   INCLUDERC=yourpath/strip_egroups_ad.rc
# in your $HOME/.procmailrc
#
# Removes the first ad in eGroups list emails.
# First and last line of ad are matched by $start and $end, and some string
# inside the ad by $admatch. The whole ad is replaced by $note.
#
# The usual headache with bare-bones Unix-rubbish: the sed solution never(!)
# works under Solaris 2.7 because -e handles neither newlines nor nested
# '{ }'-lists. Solaris awk is also too dumb - nawk is required.
# Needless to say, the GNU tools never have a problem. Long live the Penguin!
#
# In the public domain.
# Volker Kuhlmann 
<v(_dot_)kuhlmann(_at_)elec(_dot_)canterbury(_dot_)ac(_dot_)nz>
#   31 Aug 2000
#

:0
* ^Delivered-To:(_dot_)*(_at_)egroups\(_dot_)com
* ^Mailing-List:(_dot_)*(_at_)egroups\(_dot_)com

# With awk (change nawk to gawk etc. if necessary):
#
{
  end='...--------------------.*--------------------[-~>=_|e]*$'
  start="^$end"
  admatch='http:\/\/.*\.egroups\.com\/.*\/'
  note='\[obnoxious eGroups ad removed\]'
  :0 fbw
  | nawk "\
      BEGIN { ad=0; done=0 }\
      done { print; next }\
      ad && \$0 ~ \"$end\" { \
        ad=0; \
        if (match(text,\"$admatch\")) {\
          print \"$note\"; done=1\
        } else {\
          print text \$0; text=\"\" }\
        next\
      }\
      \$0 ~ \"$start\" { ad=1 }\
      ad { text=text \$0 \"\n\" }\
      !ad { print }\
      "
}

# With sed:
#
# Write $start and $end to match the whole line, but do not(!) anchor $end at
# the start of the line using "^".
# The /$admatch/! condition is necessary to remove the first ad line, in case
# $end also matches the start line.
# The conditions should be reasonably broad, and still catch if egroups changes
# some characters in the lines.
#
# Adopted from the sed FAQ:
#   :t
#   /BLOCK_TOP/,/BLOCK_END/ {
#     /BLOCK_END/! { N; b t; }
#     /regex/s/^.*BLOCK_END//
#   }
#   Suppose the beginning of the block is indicated by 'BLOCK_TOP' and
#   the end of the block is indicated by 'BLOCK_END'. If the expression
#   'regex' appears anywhere within the block, the entire block should
#   be deleted.
# The most difficult part was to get the quoting right for procmail...
#
#{
#  end='...-\{20,\}.*-\{20,\}[-~>=_|e]*$'
#  start="^$end"
#  admatch='http:\/\/.*\.egroups\.com\/.*\/'
#  note='\[obnoxious eGroups ad removed\]'
#  :0 fbw
#  | sed \
#    -e ':t' \
#    -e "/$start/,/^$end/ {        \
#          /^.*$end/! { N; b t; }; \
#         /$admatch/! { N; b t; }; \
#         /$admatch/ {             \
#           s/^.*$end/$note/ ;     \
#           :tt;                   \
#           n; b tt;               \
#          };                      \
#       }"
#}

##### EOF strip_egroups_ad.rc

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • eGroups ads, Volker Kuhlmann <=