John Oliver wrote:
There are some systems which just relay mail where SpamAssassin isn't an
option. I know there's gotta be a way to do this in procmail... isn't
there a header that tells what character set the content is, and a list
of valid character sets somewhere?
The character set can be determined from the message headers for plain
messages, or the
headers for each part for multipart messages, look for the Content-Type header
and parse
out the charset parameter (English character set is somewhat misleading as you
probably
want to allow Latin-1 and UTF-8 as well, none of which is an English character
set).
But then what should you allow, for example assuming you want to block Chinese
text
filtering on (big5|gb-?1988|gb-?2312) will catch some mail but there is no
reason why
someone could not send Chinese text in UTF-8 encoding, or UTF-16, or an HTML
message in
ISO-8859-1 (the most commonly used encoding for Web pages) or the "English
character set"
US-ASCII, with Chinese characters encoded as numeric entities. If a message
contains
multiple parts in different encodings, should it be accepted or rejected?
All of this is perfectly doable with procmail, you need to decide what exactly
you want
to filter though and how much effort you want to put in the filter. If you are
primarily
concerned about messages in a single character set, and want to block all
those, simply
adapt the code from SpamBouncer to filter those messages. For more complex
filtering
SpamAssassin may be a better choice, even for a mail relay.
--
Klaus Johannes Rusch
KlausRusch(_at_)atmedia(_dot_)net
http://www.atmedia.net/KlausRusch/
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail