procmail
[Top] [All Lists]

Re: matching embeded newline char in subject

2002-06-06 09:35:17
Mark asked,

I have received numerous spam messages that appear to have embeded
newline characters in the middle of the subject line.
I have tried:
* ^Subject:.+()$.+
To match this, but it catches everything.

In the body, ^ or $ will match the newline between two lines of text (or a
putative one, but we don't need to get into that for this problem); however,
in the head, they match the newline between two header fields (or a putative
one).  They will not match the soft newline in the middle of a header field
with a continuation line, and that's what Mark is looking for.

Every message will match the regexp Mark tried as long as it has a Subject:
header at all.  The only exceptions are negligible; if Subject: is the
bottommost field in the header or if it is totally empty, not even having a
space after the colon.  You'll probably never see either of those unless
somebody deliberately contrives a message that way just to match that regexp,
so essentially it will catch any message that has a subject.

The question of how to test for a continued header line has come up before.
Here's one solution:

 :0wh
 * ^\/Subject:.*
 dummy=| egrep "$\MATCH"

 :0e: # grep exited 1 if the subject had a continuation
 continued_subjects

Note that the following similar approach will not work, because procmail will
unfold the continued header line before feeding text to an exit code test:

 :0: # This does not work.
 * ^\/Subject:.*
 * ! ? egrep "$\MATCH"
 continued_subjects



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>