procmail
[Top] [All Lists]

Re: Malformed RFC822 headers - continuation lines not indented

2002-01-26 12:39:32
I had guessed,

| Formail would certainly consider "continued header not
| indented" to be the start of the body and would insert an empty line above
| it unless you invoke formail with the -f option, perhaps even then.

Formail, with or without -f, adds a blank line above the improperly
unindented continuation line.

| I don't
| know offhand whether a procmail recipe with either `h' or `b' but not both
| would consider "continued header not indented" the start of the body or
| would look for an empty line: my hunch is the latter.

Procmail recipes, with or without the `r' flag, consider the head to end at
the first empty line, even if that means including an unindented continuation
line in the middle, and the body to start after the first empty line.

Again, I don't know whether John considers such treatment "graceful" or not,
but it's what I found to be the case.  Those tests were done under the
3.23pre versions (without the latest patch to adjust the maximum score values
on systems that allow sixty-four-bit integers).

Come to think of it, procmail's behavior makes a cure possible, because the
newline before the indentation of a proper continuation line doesn't match ^
or $, but the newline at the beginning of an improperly unindented one does
(yes, I tried it).  The only trick is leaving any postmark lines unmolested:

 POSTMARK
 :0 # Who says you can't make a left side greedy?
 * ^^\/From .+$(>From .+$)*
 { POSTMARK=$MATCH }
 :0 # And hail the preservation of a trailing newline in $MATCH.
 * $ ^^$\POSTMARK\/(.+$)+$
 { RFC822AREA=$MATCH }
 :0hfw
 * RFC822AREA ?? ^[^    ][^     :]*([   ]|$)
 | sed -e '1{' -e '/^From /n' -e '/^> From/b' -e'}' \
   -e 's/^[^    ][^     :]*[    ]/ &/' -e 's/^[^        ][^     :]$/    &/'

where the whitespaces inside brackets are <space><tab> and those to the left
of the ampersands are tabs.  

The drawback is that perhaps a message will have no empty lines at all, so
the above will render it all into one huge head with no body (as procmail
already thought it was) by indenting everything that doesn't look like a
proper header line.  I suppose we could make sed stop substituting at the
first line that is non-empty but blank (i.e., all spaces and tabs but at
least one, ^[   ]+$) by adding this instruction between the right bracket and
the substitutions:

   -e '/^[      ][      ]*$/{' -e:a -en -eba -e'}' \

but I've received mail that actually had Received: headers with a non-empty
blank line intended as a continuation line, like this:

Received: some printing text here
    some more here
    
    rest of the header line

where the third line had just a tab on it.  It's unfriendly and unnecessary,
but it's apparently permitted.

DWT

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>