procmail
[Top] [All Lists]

Re: Problems with tonns (!!!) of spam

2003-05-19 11:41:33
At 15:35 2003-05-15 +0200, Michelle Konzack wrote:
>Yea, provided your "local" mail server is on the internet side of the
>cellphone link you're paying through some body orifice for.

Not right, because I have a local network wit 31 Servers/Workstation
and this mail-Server handle all mails.

Then how do you effectively filter the mails without having to DOWNLOAD THEM FIRST? To do this well, you need to run the filtering on a server which is BEFORE your cellphone:


(internet) --- (your upline ISP) --- (cellphone) --
                -- (your gateway/mail server) -- (your lan)

From what I gather here, you're running filtering on your gateway server, which is AFTER the cellphone (i.e., in order to filter, you're taking a traffic hit, even if only to download headers and delete messages based on those). What you need to do is filter the messages at your upline ISP and then have your gateway server pull down whatever remains (with fetchmail or what-have-you), meaning that the only traffic over the cell link should be messages you're retaining.

Say on any given day, you get 500 messages, 1/2 of which are spam (your original post, about a month ago, gave the impression that you're getting a LOT of spam). Further, let's say that each of those messages only have a 1KB block of headers (the procmail list certainly doesn't travel so light). So, each day, you're downloading 500KB (1/2MB) of headers JUST TO EXAMINE THEM, discarding the messages which 'mailfilter' thinks are spam, and then later, when fetchmail runs (I trust you're keeping the two synchronized), it's going to download 250KB of headers which were previously downloaded (but not retained in any fashion for fetchmails use). So, by spam filtering THROUGH your cellphone link, you're increasing your traffic by 500KB (in this scenario), though yea, you're eliminating spam from being part of the subsequent download. But -- if you filter at your ISP server, you don't download 500KB of headers to examine them. You just run fetchmail, and download the messages which are waiting for you. At any individual check, you may not be seeing so many messages, but if you run the math, you might come to realize that filtering THROUGH the cell connection isn't the way to go - certainly not if the goal here is to reduce data cost.

Furthermore, if conservation of bandwidth is a major concern, you might consider setting up a system whereby your gateway server shells to your upline ISP account (with ssh I'd hope), and collects up all the messages since the last event and compress them into one bz2 (or zip, or gz, or whatever) file, depositing that someplace that you could then retrieve (say, with wget or rsync, or whatever) down to your local machine, where your script could then decompress it and pump the messages through procmail locally to redeliver them. Text messages should compress very well, esp if you have a lot of them together (due to efficiencies in the compression dictionary). This could result in an appreciable reduction in traffic charges - provided that your mail checks are sufficiently spaced as to allow for the downloads to be large enough to have developed good compression dictionaries. If you check mail with fetchmail every 5-10 minutes, this isn't likely to buy you a lot. If you check every hour or two (hey, you're on a cellphone link that's costing you through the nose, it'd make some sense), then you could have reasonable bunches of email waiting. Heck, if it's supposed to be a business network (or do you really have 17 housemates?), checking at a regular period up until a few hours after close of business and then doubling or tripling the checking delay until sometime before open of business could also save you some airtime.

[snip]

I suspect you're being a bit pre-emptive WRT to what you're downloading, and are probably what I'd refer to as "over subscribed" - subscribing to a lot of stuff that perhaps isn't read at all, in the simple hope that if it is read by more than one person that you don't take the added traffic hit. Perhaps you should set up a proxy server and tell users to follow the listservs via web interface?

Different listservs deliver in different fashions, but many still deliver based on efficiency of the MTA transaction - if there are 17 users at one domain (and if I gather right, in your case, a single hosted POP mailbox), then ONE message will be sent via the SMTP. Check the return-path and see whether is it unique to the subscribed address or not. If it isn't, then pre-emptive subscriptions are wasting your bandwidth and complicating the reply process for anyone who is actually participating on a list.

So never I will answer to linux2(_dot_)mailinglits(_at_)fr(_dot_)(_dot_)(_dot_)

... which makes downloading them via archives or digest perhaps a much better solution.

Se the Header, I use my private Mail-Address and the other can do
it too.

Presuming the lists allow for nonsubscriber posts. It's a pity if they do, because that allowance results in spam abuses on mailing lists (which of course in turn leads to the problems you're having with spam traffic). A great many of the lists to which I am personally subscribed are run through my spam filtering - merely being a list doesn't make them immune to distribution of spam.

On my private address I subscribe too, but set the mailinglist to
'nomail'.

I don't follow this - are you subscribed to the list with your _own_ address in addition to the address which is used to share the list with the other users of your network?

Curently I try 'mailfilter', which is a Debian-Package.

Currently I must read the manpages, but 'mailfilter' filter the
Mails bevore downloading,

You can't filter something unseen - 'mailfilter' is downloading the headers of the messages in order to examine them. That means you're taking the traffic hit of the message headers, which can be several KB per message. That would INCLUDE the headers to all the messages which you eventually download in whole, by another process which ends up re-downloading those headers along with their bodies.

That is a losing game.

Let me repeat my earlier advice: if you're really interested in reducing the amount of bandwidth you're using for email, perform your junk filtering on the ISP server BEFORE your cell connection. Not through it.

I use procmail already in conjunction with fetchmail, but I need
the SPAM filtering bevore downloading it.

Run procmail on the ISP mail server.

Mail arrives for your account with the ISP server (to be deposited in the POP mailbox which you're fetchmail'ing from), whereupon you run it through a procmail filter that discards spam (you could use something homebrew, SpamAssassin, etc), sending what remains into the POP mailbox of the user account there (or, into a separatley compressible archive, as mentioned above). THEN, your own mail system (that which is on your local side of your cellphone link) can download these messages and deliver them locally to your accounts.

Or perhaps are you doing this all through some French equivalent to AOL or MSN, wherein you get no shell access to the upline server and it's just a POP account? If that's the case, keep in mind that spending a few bucks more for a better ISP would allow you to save considerably on your airtime. You should sit down and get some mail figures and punch them into a spreadsheet - legit messages/day, junk messages/day, legit mail/KB/day, junk mail/KB/day, cost per KB over your cell (and thus, cost per day for legit and junk mail), average KB/message header downloaded for 'mailfilter' (which is your spamfiltering overhead). Then, figure what it is costing you to download the mail headers to do spam checking on them - that monthly amount can be applied directly towards a more functional ISP service permitting you to run filters at the ISP server side of the equation. Since you can expect that over time, spam levels will increase (even if you subscribe to fewer lists, reducing your legit traffic, the spam levels won't come down at the same rate), this filter-at-the-ISP server method would make your cost-per-legit-MB-transferred that much cheaper. Your effective connection speed will be improved as well, since YOU will have less overhead over your connection associated with junk (or checking for junk).

>Alternatley, just run all your mail through something like SpamAssassin,
>via a procmail wrapper which ditches the highly suspect stuff.  Lots of
>this stuff is documented in the procmail archives.

Already done inclusive amavis and f-prot

I'm getting the distinct impression that you're still unclear that THIS SHOULD BE PERFORMED ON THE ISP MAILSERVER, NOT ON YOUR OWN. Geez, the savings from avoiding email viruses such as Klez and the lot could be significant, if you filter them BEFORE you have to download them. Some things you'll only be successful at filtering if you have access to the message body (say, to note that there's a masked executable extension on the attachment name), so 'mailfilter' won't do you any good with them.

>You might be happy to know that the European cellular services are a LOT
>cheaper than their American counterparts.

It depends in which European Country you live...

With the provider I use (but I certainly don't use them for data!), it is US$0.03/KB, or you can pre-purchase blocks of data, up to US$49.99 for 13MB (with a lower per-KB rate thereafter of only US$0.01/KB). Your scenario of 2MB/day would cost US$1843/mo at the pay-as-you-go rate, or US$531/mo if you bought up to the highest tier and paid per MB thereafter. And that's for less than 1/10 of a single CD-ROM worth of data. Let me reiterate: you've got it cheap compared to what we pay here.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>