Due to possible problems with high-bit text being destroyed
in transit, I'm appending it as a file attachment. Upload as
binary to a unix system, and unzip on the unix system. If
anybody can't get their hands on unix unzip, I can email in
other compressed formats. The following is a pseudo-code
representation of my filter. The "#" are replaced by high-bit
characters in the real thing. The first row is CHR(128)..CHR(159).
I haven't seen anything in that region, but the filter watches
anyways. The next rows are CHR(160)..CHR(191), CHR(192)..CHR(223),
and CHR(224)..CHR(255) respectively. The last row looks for
"quoted-printable" versions of high-bit characters...
:0BDfh
* -1^1 .
* 19^1 [################################]
* 19^1 [################################]
* 19^1 [################################]
* 19^1 [################################]
* 57^1 =[89A-F][0-9A-F]
| formail -A "X-Reject: Too many foreign charcters."
If an email is more than 5% unwanted characters, it is flagged.
If you want to immediately divert it to a file, get rid of the
"fh" flags, and replace the reference to formail with the name of
the junkmail file, like so...
:0BD
* -1^1 .
* 19^1 [################################]
* 19^1 [################################]
* 19^1 [################################]
* 19^1 [################################]
* 57^1 =[89A-F][0-9A-F]
junkmail
Here's the logic (If you're unfamiliar with procmail "scoring"
read "man procmailsc").
-> "* -1^1 ." - count the number of characters in the body and
subtract from the accumulator.
-> next 4 lines - add 19 to the accumulator for each forbidden
character in the body.
-> 6th line - add 57 to the accumulator for each group of 3
consecutive characters in the body that form a
"quoted-printable" character in the range
"=80" through "=FF".
If the final result is positive, the action at the bottom of the
recipe is executed.
*BUT WHAT ABOUT LEGITIMATE NON-ENGLISH EMAIL*. I assume you're
talking about European languages that have some accents. You can
add lines to subtract 19 for each acceptable high-bit character and
and 57 for the quoted-printable version of that character. This
will offset false matches in the recipe.
--
Walter Dnes <waltdnes(_at_)waltdnes(_dot_)org>
http://www.waltdnes.org <SpamDunk Project procmail spamfilters>
FOREIGN.ZIP
Description: Binary data