procmail
[Top] [All Lists]

Re: filtering/replacing keywords inside of email

2000-09-26 09:30:13
"William @ Smart Guys" wrote:
I'm looking for a way to replace certain keywords (ie  offensive words that
my clients want filtered) in an email with asterisks.  IE the f word would
be replaced with     ****
(ie, one * for each character of the word)

This strikes me as rather difficult to do in the general case.  

Are there MIME mails that you want to do this with?  HTML or MS-Word
documents?  

If all your mail to be modified is plaintext only, never encoded, and
never multipart, then something like the below might work.
Limitations include:
1. document must contain spaces, not tabs
2. the sed script must not exceed the limitations of your shell/sed
3. probably some others that I don't know 

    :0 bfw
    * ! ^content-type:.*multipart
    * ! ^content-transfer-encoding:.*quoted-p
    * ! ^content-transfer-encoding:.*base
    * ! ^content-transfer-encoding:.*x-uu
    | sed -e 's/^/ /' -e 's/$/ /' | sed -f /path/to/replace.keywords.sed | \
        sed -e 's/^ //' -e 's/ $//'

where the script "replace.keywords.sed" contains something like this
(say your keywords were "dork" and "xyzzy"):

    s/\([^a-zA-Z]\)dork\([^a-zA-Z]\)/\1****\2/g
    s/\([^a-zA-Z]\)xyzzy\([^a-zA-Z]\)/\1*****\2/g

You'd have to create a separate line for each keyword.  You'll also
need separate lines in the sed script for your keywords with different
capitalizations (e.g. I have only one line for "dork" -- you'd need
one for "Dork" and "DORK") and variant forms (e.g., if you want to
also change "DORKS GALORE" to "***** GALORE" you'd need to modify the
regex some) ...  Maybe you should use perl rather than sed for this?

Is there anyway to do this with procmail?  If not, are there any products
out that would do this?

I think it would be very hard to do in the general case.  And if
somebody sent you a C program that contained variable names matching
your keywords, then you might find

     ***** = TRUE;
     /* If the user says *****/blahblah then bump num_***** */
     num_*****++;

in your source :^<

hth,
-- 
Neither I nor my employer will accept any liability for any problems
or consequential loss caused by relying on this information.  Sorry.
Collin Park                         Not a statement of my employer.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>