Re: Problems with tonns (!!!) of spam
2003-05-19 11:41:33
At 15:35 2003-05-15 +0200, Michelle Konzack wrote:
>Yea, provided your "local" mail server is on the internet side of the
>cellphone link you're paying through some body orifice for.
Not right, because I have a local network wit 31 Servers/Workstation
and this mail-Server handle all mails.
Then how do you effectively filter the mails without having to DOWNLOAD
THEM FIRST? To do this well, you need to run the filtering on a server
which is BEFORE your cellphone:
(internet) --- (your upline ISP) --- (cellphone) --
-- (your gateway/mail server) -- (your lan)
From what I gather here, you're running filtering on your gateway server,
which is AFTER the cellphone (i.e., in order to filter, you're taking a
traffic hit, even if only to download headers and delete messages based on
those). What you need to do is filter the messages at your upline ISP and
then have your gateway server pull down whatever remains (with fetchmail or
what-have-you), meaning that the only traffic over the cell link should be
messages you're retaining.
Say on any given day, you get 500 messages, 1/2 of which are spam (your
original post, about a month ago, gave the impression that you're getting a
LOT of spam). Further, let's say that each of those messages only have a
1KB block of headers (the procmail list certainly doesn't travel so
light). So, each day, you're downloading 500KB (1/2MB) of headers JUST TO
EXAMINE THEM, discarding the messages which 'mailfilter' thinks are spam,
and then later, when fetchmail runs (I trust you're keeping the two
synchronized), it's going to download 250KB of headers which were
previously downloaded (but not retained in any fashion for fetchmails
use). So, by spam filtering THROUGH your cellphone link, you're increasing
your traffic by 500KB (in this scenario), though yea, you're eliminating
spam from being part of the subsequent download. But -- if you filter at
your ISP server, you don't download 500KB of headers to examine them. You
just run fetchmail, and download the messages which are waiting for
you. At any individual check, you may not be seeing so many messages, but
if you run the math, you might come to realize that filtering THROUGH the
cell connection isn't the way to go - certainly not if the goal here is to
reduce data cost.
Furthermore, if conservation of bandwidth is a major concern, you might
consider setting up a system whereby your gateway server shells to your
upline ISP account (with ssh I'd hope), and collects up all the messages
since the last event and compress them into one bz2 (or zip, or gz, or
whatever) file, depositing that someplace that you could then retrieve
(say, with wget or rsync, or whatever) down to your local machine, where
your script could then decompress it and pump the messages through procmail
locally to redeliver them. Text messages should compress very well, esp if
you have a lot of them together (due to efficiencies in the compression
dictionary). This could result in an appreciable reduction in traffic
charges - provided that your mail checks are sufficiently spaced as to
allow for the downloads to be large enough to have developed good
compression dictionaries. If you check mail with fetchmail every 5-10
minutes, this isn't likely to buy you a lot. If you check every hour or
two (hey, you're on a cellphone link that's costing you through the nose,
it'd make some sense), then you could have reasonable bunches of email
waiting. Heck, if it's supposed to be a business network (or do you really
have 17 housemates?), checking at a regular period up until a few hours
after close of business and then doubling or tripling the checking delay
until sometime before open of business could also save you some airtime.
[snip]
I suspect you're being a bit pre-emptive WRT to what you're downloading,
and are probably what I'd refer to as "over subscribed" - subscribing to a
lot of stuff that perhaps isn't read at all, in the simple hope that if it
is read by more than one person that you don't take the added traffic
hit. Perhaps you should set up a proxy server and tell users to follow the
listservs via web interface?
Different listservs deliver in different fashions, but many still deliver
based on efficiency of the MTA transaction - if there are 17 users at one
domain (and if I gather right, in your case, a single hosted POP mailbox),
then ONE message will be sent via the SMTP. Check the return-path and see
whether is it unique to the subscribed address or not. If it isn't, then
pre-emptive subscriptions are wasting your bandwidth and complicating the
reply process for anyone who is actually participating on a list.
So never I will answer to linux2(_dot_)mailinglits(_at_)fr(_dot_)(_dot_)(_dot_)
... which makes downloading them via archives or digest perhaps a much
better solution.
Se the Header, I use my private Mail-Address and the other can do
it too.
Presuming the lists allow for nonsubscriber posts. It's a pity if they do,
because that allowance results in spam abuses on mailing lists (which of
course in turn leads to the problems you're having with spam traffic). A
great many of the lists to which I am personally subscribed are run through
my spam filtering - merely being a list doesn't make them immune to
distribution of spam.
On my private address I subscribe too, but set the mailinglist to
'nomail'.
I don't follow this - are you subscribed to the list with your _own_
address in addition to the address which is used to share the list with the
other users of your network?
Curently I try 'mailfilter', which is a Debian-Package.
Currently I must read the manpages, but 'mailfilter' filter the
Mails bevore downloading,
You can't filter something unseen - 'mailfilter' is downloading the headers
of the messages in order to examine them. That means you're taking the
traffic hit of the message headers, which can be several KB per
message. That would INCLUDE the headers to all the messages which you
eventually download in whole, by another process which ends up
re-downloading those headers along with their bodies.
That is a losing game.
Let me repeat my earlier advice: if you're really interested in reducing
the amount of bandwidth you're using for email, perform your junk filtering
on the ISP server BEFORE your cell connection. Not through it.
I use procmail already in conjunction with fetchmail, but I need
the SPAM filtering bevore downloading it.
Run procmail on the ISP mail server.
Mail arrives for your account with the ISP server (to be deposited in the
POP mailbox which you're fetchmail'ing from), whereupon you run it through
a procmail filter that discards spam (you could use something homebrew,
SpamAssassin, etc), sending what remains into the POP mailbox of the user
account there (or, into a separatley compressible archive, as mentioned
above). THEN, your own mail system (that which is on your local side of
your cellphone link) can download these messages and deliver them locally
to your accounts.
Or perhaps are you doing this all through some French equivalent to AOL or
MSN, wherein you get no shell access to the upline server and it's just a
POP account? If that's the case, keep in mind that spending a few bucks
more for a better ISP would allow you to save considerably on your
airtime. You should sit down and get some mail figures and punch them into
a spreadsheet - legit messages/day, junk messages/day, legit mail/KB/day,
junk mail/KB/day, cost per KB over your cell (and thus, cost per day for
legit and junk mail), average KB/message header downloaded for 'mailfilter'
(which is your spamfiltering overhead). Then, figure what it is costing
you to download the mail headers to do spam checking on them - that monthly
amount can be applied directly towards a more functional ISP service
permitting you to run filters at the ISP server side of the
equation. Since you can expect that over time, spam levels will increase
(even if you subscribe to fewer lists, reducing your legit traffic, the
spam levels won't come down at the same rate), this filter-at-the-ISP
server method would make your cost-per-legit-MB-transferred that much
cheaper. Your effective connection speed will be improved as well, since
YOU will have less overhead over your connection associated with junk (or
checking for junk).
>Alternatley, just run all your mail through something like SpamAssassin,
>via a procmail wrapper which ditches the highly suspect stuff. Lots of
>this stuff is documented in the procmail archives.
Already done inclusive amavis and f-prot
I'm getting the distinct impression that you're still unclear that THIS
SHOULD BE PERFORMED ON THE ISP MAILSERVER, NOT ON YOUR OWN. Geez, the
savings from avoiding email viruses such as Klez and the lot could be
significant, if you filter them BEFORE you have to download them. Some
things you'll only be successful at filtering if you have access to the
message body (say, to note that there's a masked executable extension on
the attachment name), so 'mailfilter' won't do you any good with them.
>You might be happy to know that the European cellular services are a LOT
>cheaper than their American counterparts.
It depends in which European Country you live...
With the provider I use (but I certainly don't use them for data!), it is
US$0.03/KB, or you can pre-purchase blocks of data, up to US$49.99 for 13MB
(with a lower per-KB rate thereafter of only US$0.01/KB). Your scenario of
2MB/day would cost US$1843/mo at the pay-as-you-go rate, or US$531/mo if
you bought up to the highest tier and paid per MB thereafter. And that's
for less than 1/10 of a single CD-ROM worth of data. Let me reiterate:
you've got it cheap compared to what we pay here.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail
|
|