procmail
[Top] [All Lists]

Re: Filtering on character sets (was: Non-English character sets)

2003-12-04 03:45:32
John Oliver wrote:

There are some systems which just relay mail where SpamAssassin isn't an
option.  I know there's gotta be a way to do this in procmail... isn't
there a header that tells what character set the content is, and a list
of valid character sets somewhere?

The character set can be determined from the message headers for plain 
messages, or the
headers for each part for multipart messages, look for the Content-Type header 
and parse
out the charset parameter (English character set is somewhat misleading as you 
probably
want to allow Latin-1 and UTF-8 as well, none of which is an English character 
set).

But then what should you allow, for example assuming you want to block Chinese 
text
filtering on (big5|gb-?1988|gb-?2312) will catch some mail but there is no 
reason why
someone could not send Chinese text in UTF-8 encoding, or UTF-16, or an HTML 
message in
ISO-8859-1 (the most commonly used encoding for Web pages) or the "English 
character set"
US-ASCII, with Chinese characters encoded as numeric entities.  If a message 
contains
multiple parts in different encodings, should it be accepted or rejected?

All of this is perfectly doable with procmail, you need to decide what exactly 
you want
to filter though and how much effort you want to put in the filter. If you are 
primarily
concerned about messages in a single character set, and want to block all 
those, simply
adapt the code from SpamBouncer to filter those messages. For more complex 
filtering
SpamAssassin may be a better choice, even for a mail relay.

--
Klaus Johannes Rusch
KlausRusch(_at_)atmedia(_dot_)net
http://www.atmedia.net/KlausRusch/



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>