procmail
[Top] [All Lists]

Re: bad message id's

1998-03-18 23:22:03
On Wed, 18 Mar 1998 16:35:39 -0800, Michael Helm 
<helm(_at_)fionn(_dot_)es(_dot_)net>
wrote:
Christopher Lindsey writes:
:0:
* !^Message-Id:[\t ]+<("[^"]+"|[^ <>@]+)@[^<>]*>$
                           ^^^^^^^^^^^^^^^^^
Is this compliant yet?  I'm not sure it's liberal enuf about
what's left of "@" compared to the spec.  For example,

Try this (not sure if it's up to spec either, I just fixed what I
thought was obviously wrong with the above):

    :0:
    * ! ^Message-Id:[   ]*<("[^"]*"|[^  <>@"]+)+(_at_)[^<>@     ]+>[    ]*$
    no-valid-msgid

This passes through 

  Message-Id: <"OP-MIME expo400:439*"" 
<allanm(_at_)op(_dot_)x400(_dot_)icl(_dot_)co(_dot_)uk>"@MHS>

but traps out

  Message-ID:  <M10250103.005.z2j35.1.980317164507Z
    .CC-MAIL*/O=HQ/PRMD=USDOE/ADMD=ATTMAIL/C=US
                /@MHS>

Just to highlight my changes:

  * Change :[\t ]+ to :[        ]* (\t is not valid in Procmail but I
    take it Chris is using it in a pseudocode fashion)

  * Permit empty pair of quotes (is this legal?)

  * Allow iterations over [pseudocode] (".*"|[^"]+) with a + 
    (I take it the localpart can't be empty)

  * Disallow spaces and tabs after @ and require at least one char

  * Permit whitespace after closing broket

Can't there be backslash-escaped quotes inside double-quoted strings?
Any other problems?

Your point is well taken, it's clear message-id checks have to
be part of a scoring mechanism for spam control.  I'd like to
just trap non-conforming ones & not try to enumerate all the
spam special cases (espec since they probably have a limited
lifetime).  Or perhaps combine them (if it's non-conforming
and fits this other pattern, it's Spam &c).

Just as a data point, I have been using the "pedestrian" version of
this filter for several months (since 1997/08/06 07:15:59 sez RCS, but
that was a tweak of an earlier version) and it only caught one piece
of legit mail (and that was from a guy who was trying to write his own
MUA :-) ... but then that's mostly mail from a limited set of sources
(mailing lists, mostly Sendmail sites, mostly people using Pine, elm,
mutt, Pegasus, Eudora, etc).

/* era */

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>

<Prev in Thread] Current Thread [Next in Thread>