procmail
[Top] [All Lists]

Re: Consecutive lines

2003-04-04 01:35:06
On Thu, Apr 03, 2003 at 05:03:22PM -0700, Daryle A. Tilroe wrote:

Dallman Ross wrote:

On Wed, Apr 02, 2003 at 08:34:41PM -0700, Daryle A. Tilroe wrote:

It is, of course, trivial to construct a message with a different,
or empty, plain text section and still have the obtusificated
html spam (again I have not seen this yet but it seems an obvious
extension)... Hmmmm.

I had it happen early this week, and created a weighted condition to
handle it:

 * $ 6^0 ^Content-T[^$WS]+:[^:]+(^[$WS]*)+------=_NextPart_

I see a few problems with that regexp in terms of generalization and
the feasibility as a whole.  For one think the boundary string does
not need to contain "_NextPart_".  It could also have lines in between
or plain innocuous plain text content before the boundary.

See simplified example below.

Well, okay, but of the 5,000 spams I get a month, I finally saw one
that tried this new trick, and I coded for it.  If you see production
examples of other variations in spam, we can make them all go away.
I don't want to code for theoretical maybes; I have enough real
spam to dispatch as efficiently as I can, thanks.  :)

As for the blank lines in between, my spam had that, and my example
code handles it fine.  Spaces or tabs will also be handled fine.

If the spammer puts other text in there, then, yes, you're right,
it wouldn't be discovered by this simple test.  Then we'd have to
fall back on more traditional spam identification.

As for the general feasibility, fleet says it got almost 1% of
his spam archive, which is not so bad.  I am no MIME expert or
specialist, as reaffirmed by the fact that almost every one of
my very effective anti-spam recipes looks only at the headers,
because I long ago determined that body manipulation in the
search for spam is not necessary or required.  So if _NextPart_
is not the generic or canonical or only way to form a valid MIME
separator, well, change it to the generic or canonical or additional
way.  Since these are weighted conditions in my example, we can
have a slew of them in the recipe, if we need to hit on any one
of them ORed.


I guess what I am saying is that there is probably no simple way of
protecting against obtusificated html that has a reasonably valid,
if unrelated, plain text part.  One would have do some basic html

If your new requirement is that the text part is reasonably valid,
then I tend to agree.  That seems like a different limitation than
your earlier statement, however.

rendering first.  I guess you could weight against excessive html
comments (<!--.*-->) but then they could just break things up with
redundant formatting commands instead.  I think the spammers may have
us on this one.

They have us when they have us; not when we can conceive of ways for
them to have us.  So far, spammers continue to genuflect to the least
common denominator in most of the things they do, and simple tests
continue to be 99% effective, as they have been for months.

Just as we need gobs of extra coding and sweat effort to attain
that last 1%, so do the spammers.  Since they have no economic
motivation to do that, and since they are mostly in it for money
while we are mostly in it for what I'll call the two E's, essense
and elegance, I'm not ready to acquiesce and agree that they have
us.

------------------------------------------------------------
*Message-ID: <3E8AF8BD(_dot_)30100(_at_)micralyne(_dot_)com>
*Date:        Wed, 02 Apr 2003 07:50:37 -0700
*From:        "Daryle A. Tilroe" <daryle(_at_)micralyne(_dot_)com>
*MIME-Version: 1.0
*To:  daryle <daryle(_at_)micralyne(_dot_)com>
*Subject: html test
*Content-Type: multipart/alternative;
* boundary="------------090000040406000708090209"
*
*
*--------------090000040406000708090209
*Content-Type: text/plain; charset=us-ascii; format=flowed
*Content-Transfer-Encoding: 7bit
*
*ANY OLD TEXT OR NOTHING AT ALL
*
*--------------090000040406000708090209
*Content-Type: text/html; charset=us-ascii
*Content-Transfer-Encoding: 7bit
*
*<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
*<html>
*<head>
*
*OBTUSIFICATED HTML HERE
*
*</body>
*</html>
*
*--------------090000040406000708090209--
-----------------------------------------------------------

-- 
dman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>