procmail
[Top] [All Lists]

Re: Consecutive lines

2003-04-04 08:06:18
I must apologize for two things.  First I am sorry if I
came across overly critical.  I was merely trying to
point out shortcomings for the benefit of all.  Second,
to my utter embarrassment, I somehow started spelling
"obfuscated" as "obtusificated".  How obtuse of me! :-)

Dallman Ross wrote:

Well, okay, but of the 5,000 spams I get a month, I finally saw one
that tried this new trick, and I coded for it.  If you see production
examples of other variations in spam, we can make them all go away.
I don't want to code for theoretical maybes; I have enough real
spam to dispatch as efficiently as I can, thanks.  :)

The multipart boundary without "_NextPart_" is my no means
theoretical.  Much of my spam has a random string there.
Heck even NS/Mozilla mail uses a random string.  Also some
amount of theory and anticipation is required when you are
trying to stay one step ahead of these buggers.

As for the blank lines in between, my spam had that, and my example
code handles it fine.  Spaces or tabs will also be handled fine.

If the spammer puts other text in there, then, yes, you're right,
it wouldn't be discovered by this simple test.  Then we'd have to
fall back on more traditional spam identification.

This is my biggest worry.  It is a short step from blank to a
bit of text.

As for the general feasibility, fleet says it got almost 1% of
his spam archive, which is not so bad.  I am no MIME expert or

You know that standard disclaimer about investments and spam:  Past
performance does not guarantee future results. ;-)  IMHO testing
against an archive is only marginally useful since spam is
constantly evolving.  Heck if it was static then I would not
have this procmail/regexp addiction.  It's like crack man! :-)

specialist, as reaffirmed by the fact that almost every one of
my very effective anti-spam recipes looks only at the headers,
because I long ago determined that body manipulation in the
search for spam is not necessary or required.  So if _NextPart_
is not the generic or canonical or only way to form a valid MIME
separator, well, change it to the generic or canonical or additional
way.  Since these are weighted conditions in my example, we can
have a slew of them in the recipe, if we need to hit on any one
of them ORed.

The best way to get the boundary would be to use a match on the
original boundary definition in the header and then use that as
your regexp string.

I guess what I am saying is that there is probably no simple way of
protecting against obtusificated html that has a reasonably valid,
if unrelated, plain text part.  One would have do some basic html


If your new requirement is that the text part is reasonably valid,
then I tend to agree.  That seems like a different limitation than
your earlier statement, however.


Yes I was making two points about the limitation:  1. Boundaries
are generic (present problem); and 2. Valid plain text could be
inserted (future problem).


rendering first.  I guess you could weight against excessive html
comments (<!--.*-->) but then they could just break things up with
redundant formatting commands instead.  I think the spammers may have
us on this one.


They have us when they have us; not when we can conceive of ways for
them to have us.  So far, spammers continue to genuflect to the least
common denominator in most of the things they do, and simple tests
continue to be 99% effective, as they have been for months.

Just as we need gobs of extra coding and sweat effort to attain
that last 1%, so do the spammers.  Since they have no economic
motivation to do that, and since they are mostly in it for money
while we are mostly in it for what I'll call the two E's, essense
and elegance, I'm not ready to acquiesce and agree that they have
us.

RAH!  RAH!  YAAAA!!!  And I mean it.  Talk about an inspiring
speech for the battered and weary troops!!  I'm going to put that
excessive comment weighting in my recipe today.  We shall never
surrender; whatever the cost may be!

--
Daryle A. Tilroe


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>