Re: Invalid message-ids

At 02:37 PM 8/11/97 EDT, Eli the Bearded wrote:

lists(_at_)professional(_dot_)org (Lists account) wrote:

"\ " - one space     (Normal regexp for this is "\s", but egrep doesn't
recognize it -- "\ " IS valid though, as would be a non-escaped space.
[:space:] should work too.)

\s and [:space:] are character classes. They match a single whitespace
char, and not specifically a space. Procmail does not support either.

From man procmailrc:

"These regular expressions are completely compatible to the normal egrep(1)
extended regular expressions"

\s was a slip from Perl (Eli, how's that port coming along???), but
[:space:] is direct from man egrep.  I don't disagree with you, but could
you point me to the document describing what these egrep(1) extended
regular expressions are, because on my system man egrep is .so grep.1 - and
the extended regexp there says these are valid -- though I haven't been
using them, so had no reason to doubt the documentation I had when I went
quoting from it.

"\<" - Match the empty string BEFORE a word (see man procmailrc)
"\>" - Match the empty string AFTER a word (see man procmailrc)

Nope. In vi and some greps, yes, not in procmail. Both are shortcuts for
(basically):


You're right, sorry.  It is indeed a SINGLE DELIMITING CHARACTER before,
and after, a word, respectively.  I was quoting from the egrep
documentation at the time.

The important distinctions to note here is that while \< \> are
interchangible and *not* zero-width (or "empty string") they will
match line end conditions. (I have heard tell that $ and ^ are not


Good to note.

* ^Message-ID:\ +<\ *>

The backslashes are not needed. (They are harmless, though.)


I understand that.  I do believe I mentioned that in my dissection of the
original regexp (see above).  If for nothing else, they serve to clearly
demarcate spaces in a printed context if each space is escaped in such a
fashion.  When I write syntax critical things on paper, I place a "floating
dot" (not a period, but a small dot in the middle of a character cell) to
demarcate hard spaces for just this reason.

In any event, the recipe I presented should be valid (unnecissarily escaped
spaces and all), for the simple task of matching empty or nonexistant
message IDs - well, except those instances where the header exists, but
isn't followed by a bracketed string, which leads me to the following
modification of it:

The following recipe _should_ catch instances of the header being present
but not being followed by any alpha-numeric (some combination of which
should normally appear in the messageid) or underscore -- that is, header,
followed by nothing, or just spaces (or tabs), or an empty <>, and several
other variances:

# Catch functionally "empty" messageids.
:0:
* ^Message-ID:\>*$
SPAM

# Catch messages without a messageid.
:0:
* ! ^Message-ID:
SPAM

I ran this through several header tests, and it functioned fine.  Comments?

---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395