procmail
[Top] [All Lists]

Re: Parsing multiple headers.

1999-03-16 17:31:39
Mark Swanson <Mark(_dot_)Swanson(_at_)SARAIDE(_dot_)COM> writes:
Currently I'm using formail to parse the contenttype header. In the case
of a mime/multipart message there will be more than one of these
headers. Is there a way to get all of them? F.E. in /etc/procmailrc
...

The only sure-fire way to identify the embedded headers of a multipart
message is to do a full MIME parse.  On the other hand, if you know the
message is a multipart message (from the content-type header in the
main header), you can make a relatively good guess as to whether any of
the parts are of a given type by just matching for the appropriate
string against the body:

        :0
        * ^Content-Type: *multipart/
        * B ?? ^Content-Type: *text/html
        { whatever }

A more drastic method is to extract the boundary string and use that to
locate the embedded headers:

        :0
        * ^Content-Type: *multipart/.*boundary=("\/[^"]+|\/[^" ;        ]+)
        * B ?? $ ^--$MATCH($)(.+$)*Content-Type: *text/html
        { whatever }


I _think_ that combo of conditions is relatively robust, but I am not
100% sure.

However, if you want a complete list of embedded content-types you
should pass off the job to a full blown programming language like Perl
where a complete MIME parse can be done and worked with.


Philip Guenther

<Prev in Thread] Current Thread [Next in Thread>