Re: Problems with tonns (!!!) of spam

At 15:35 2003-05-15 +0200, Michelle Konzack wrote:

>Yea, provided your "local" mail server is on the internet side of the
>cellphone link you're paying through some body orifice for.

Not right, because I have a local network wit 31 Servers/Workstation
and this mail-Server handle all mails.

Then how do you effectively filter the mails without having to DOWNLOADTHEM FIRST? To do this well, you need to run the filtering on a serverwhich is BEFORE your cellphone:



(internet) --- (your upline ISP) --- (cellphone) --
                -- (your gateway/mail server) -- (your lan)

From what I gather here, you're running filtering on your gateway server,which is AFTER the cellphone (i.e., in order to filter, you're taking atraffic hit, even if only to download headers and delete messages based onthose). What you need to do is filter the messages at your upline ISP andthen have your gateway server pull down whatever remains (with fetchmail orwhat-have-you), meaning that the only traffic over the cell link should bemessages you're retaining.

Say on any given day, you get 500 messages, 1/2 of which are spam (youroriginal post, about a month ago, gave the impression that you're getting aLOT of spam). Further, let's say that each of those messages only have a1KB block of headers (the procmail list certainly doesn't travel solight). So, each day, you're downloading 500KB (1/2MB) of headers JUST TOEXAMINE THEM, discarding the messages which 'mailfilter' thinks are spam,and then later, when fetchmail runs (I trust you're keeping the twosynchronized), it's going to download 250KB of headers which werepreviously downloaded (but not retained in any fashion for fetchmailsuse). So, by spam filtering THROUGH your cellphone link, you're increasingyour traffic by 500KB (in this scenario), though yea, you're eliminatingspam from being part of the subsequent download. But -- if you filter atyour ISP server, you don't download 500KB of headers to examine them. Youjust run fetchmail, and download the messages which are waiting foryou. At any individual check, you may not be seeing so many messages, butif you run the math, you might come to realize that filtering THROUGH thecell connection isn't the way to go - certainly not if the goal here is toreduce data cost.

Furthermore, if conservation of bandwidth is a major concern, you mightconsider setting up a system whereby your gateway server shells to yourupline ISP account (with ssh I'd hope), and collects up all the messagessince the last event and compress them into one bz2 (or zip, or gz, orwhatever) file, depositing that someplace that you could then retrieve(say, with wget or rsync, or whatever) down to your local machine, whereyour script could then decompress it and pump the messages through procmaillocally to redeliver them. Text messages should compress very well, esp ifyou have a lot of them together (due to efficiencies in the compressiondictionary). This could result in an appreciable reduction in trafficcharges - provided that your mail checks are sufficiently spaced as toallow for the downloads to be large enough to have developed goodcompression dictionaries. If you check mail with fetchmail every 5-10minutes, this isn't likely to buy you a lot. If you check every hour ortwo (hey, you're on a cellphone link that's costing you through the nose,it'd make some sense), then you could have reasonable bunches of emailwaiting. Heck, if it's supposed to be a business network (or do you reallyhave 17 housemates?), checking at a regular period up until a few hoursafter close of business and then doubling or tripling the checking delayuntil sometime before open of business could also save you some airtime.


[snip]

I suspect you're being a bit pre-emptive WRT to what you're downloading,and are probably what I'd refer to as "over subscribed" - subscribing to alot of stuff that perhaps isn't read at all, in the simple hope that if itis read by more than one person that you don't take the added traffichit. Perhaps you should set up a proxy server and tell users to follow thelistservs via web interface?

Different listservs deliver in different fashions, but many still deliverbased on efficiency of the MTA transaction - if there are 17 users at onedomain (and if I gather right, in your case, a single hosted POP mailbox),then ONE message will be sent via the SMTP. Check the return-path and seewhether is it unique to the subscribed address or not. If it isn't, thenpre-emptive subscriptions are wasting your bandwidth and complicating thereply process for anyone who is actually participating on a list.

So never I will answer to linux2(_dot_)mailinglits(_at_)fr(_dot_)(_dot_)(_dot_)

... which makes downloading them via archives or digest perhaps a muchbetter solution.

Se the Header, I use my private Mail-Address and the other can do
it too.

Presuming the lists allow for nonsubscriber posts. It's a pity if they do,because that allowance results in spam abuses on mailing lists (which ofcourse in turn leads to the problems you're having with spam traffic). Agreat many of the lists to which I am personally subscribed are run throughmy spam filtering - merely being a list doesn't make them immune todistribution of spam.

On my private address I subscribe too, but set the mailinglist to
'nomail'.

I don't follow this - are you subscribed to the list with your _own_address in addition to the address which is used to share the list with theother users of your network?

Curently I try 'mailfilter', which is a Debian-Package.

Currently I must read the manpages, but 'mailfilter' filter the
Mails bevore downloading,

You can't filter something unseen - 'mailfilter' is downloading the headersof the messages in order to examine them. That means you're taking thetraffic hit of the message headers, which can be several KB permessage. That would INCLUDE the headers to all the messages which youeventually download in whole, by another process which ends upre-downloading those headers along with their bodies.


That is a losing game.

Let me repeat my earlier advice: if you're really interested in reducingthe amount of bandwidth you're using for email, perform your junk filteringon the ISP server BEFORE your cell connection. Not through it.

I use procmail already in conjunction with fetchmail, but I need
the SPAM filtering bevore downloading it.


Run procmail on the ISP mail server.

Mail arrives for your account with the ISP server (to be deposited in thePOP mailbox which you're fetchmail'ing from), whereupon you run it througha procmail filter that discards spam (you could use something homebrew,SpamAssassin, etc), sending what remains into the POP mailbox of the useraccount there (or, into a separatley compressible archive, as mentionedabove). THEN, your own mail system (that which is on your local side ofyour cellphone link) can download these messages and deliver them locallyto your accounts.

Or perhaps are you doing this all through some French equivalent to AOL orMSN, wherein you get no shell access to the upline server and it's just aPOP account? If that's the case, keep in mind that spending a few bucksmore for a better ISP would allow you to save considerably on yourairtime. You should sit down and get some mail figures and punch them intoa spreadsheet - legit messages/day, junk messages/day, legit mail/KB/day,junk mail/KB/day, cost per KB over your cell (and thus, cost per day forlegit and junk mail), average KB/message header downloaded for 'mailfilter'(which is your spamfiltering overhead). Then, figure what it is costingyou to download the mail headers to do spam checking on them - that monthlyamount can be applied directly towards a more functional ISP servicepermitting you to run filters at the ISP server side of theequation. Since you can expect that over time, spam levels will increase(even if you subscribe to fewer lists, reducing your legit traffic, thespam levels won't come down at the same rate), this filter-at-the-ISPserver method would make your cost-per-legit-MB-transferred that muchcheaper. Your effective connection speed will be improved as well, sinceYOU will have less overhead over your connection associated with junk (orchecking for junk).

>Alternatley, just run all your mail through something like SpamAssassin,
>via a procmail wrapper which ditches the highly suspect stuff.  Lots of
>this stuff is documented in the procmail archives.

Already done inclusive amavis and f-prot

I'm getting the distinct impression that you're still unclear that THISSHOULD BE PERFORMED ON THE ISP MAILSERVER, NOT ON YOUR OWN. Geez, thesavings from avoiding email viruses such as Klez and the lot could besignificant, if you filter them BEFORE you have to download them. Somethings you'll only be successful at filtering if you have access to themessage body (say, to note that there's a masked executable extension onthe attachment name), so 'mailfilter' won't do you any good with them.

>You might be happy to know that the European cellular services are a LOT
>cheaper than their American counterparts.

It depends in which European Country you live...

With the provider I use (but I certainly don't use them for data!), it isUS$0.03/KB, or you can pre-purchase blocks of data, up to US$49.99 for 13MB(with a lower per-KB rate thereafter of only US$0.01/KB). Your scenario of2MB/day would cost US$1843/mo at the pay-as-you-go rate, or US$531/mo ifyou bought up to the highest tier and paid per MB thereafter. And that'sfor less than 1/10 of a single CD-ROM worth of data. Let me reiterate:you've got it cheap compared to what we pay here.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail