John Oliver wrote:
There are some systems which just relay mail where SpamAssassin isn't an
option. I know there's gotta be a way to do this in procmail... isn't
there a header that tells what character set the content is, and a list
of valid character sets somewhere?
The character set can be determined from the message headers for plain
messages, or the
headers for each part for multipart messages, look for the Content-Type header
out the charset parameter (English character set is somewhat misleading as you
want to allow Latin-1 and UTF-8 as well, none of which is an English character
But then what should you allow, for example assuming you want to block Chinese
filtering on (big5|gb-?1988|gb-?2312) will catch some mail but there is no
someone could not send Chinese text in UTF-8 encoding, or UTF-16, or an HTML
ISO-8859-1 (the most commonly used encoding for Web pages) or the "English
US-ASCII, with Chinese characters encoded as numeric entities. If a message
multiple parts in different encodings, should it be accepted or rejected?
All of this is perfectly doable with procmail, you need to decide what exactly
to filter though and how much effort you want to put in the filter. If you are
concerned about messages in a single character set, and want to block all
adapt the code from SpamBouncer to filter those messages. For more complex
SpamAssassin may be a better choice, even for a mail relay.
Klaus Johannes Rusch
procmail mailing list