procmail
[Top] [All Lists]

Re: convert a HTML multipart message to a plain Text formated Message

2006-10-16 10:33:40

On Mon, 16 Oct 2006, Matthias H?ker wrote:


first thanks for all the hints and tips


after i have studied RFC1521 fixed some problems
with demime  tweaking  stripmime lerning about
makemime and reformime from the dropmail pakage
i finally puzzled something together

the ups are it is working:)
the downs are that i need to know every mime
attachment type who is possible

i need  to understand how i  can make a procmail recipe
who recursive reads out all Content-Type: fields from tthe Body
exept text/plain and text/html
and put the findings in a  comma separated list like

application/octet-stream, image/jpeg ......

        Are you doing an exercise to learn procmail or you need it in
        production?  I ask because you fork to external programs and
        back to procmail over and over.  You can group the 'fw's into
        one shell script with sed\awk or perl and release the queue of
        the messages in the server.  Remember that if you need to fix
        something under the water you don't call diver and teach him
        locksmithing, but, you call locksmith and teach him how to dive.


with this i could use the - i parameter from
stripmime.pl to rescue all attachments.

if someone could help me on that would be nice

other comments highly welcome !!

*****
i am shure there will be other ways to do it but ....

VERBOSE = ON
#saving a copy just in case of error :)
:0 c:
/var/mail/test

#thanks to BART :)
:0
* 9876543210^0 Content-Type(.|$[    ])*boundary="\/[^"]+
* 9876543210^0 Content-Type(.|$[    ])*boundary=\/[^;    ]+
{ BOUNDARY=$MATCH }

#make shure there is something in BOUNDARY
:0
* ! BOUNDARY ?? .
{ BOUNDARY = "xyz123789klopqrs" }

:0 c w
{
#make shure /tmp/tmp.000 exist and is empty
:0 w
* ? ( echo "" > /tmp/tmp.000 )
{ X="" }

        The parenthesis are extra.  If you set SELL=/bin/sh (recommanded)
        you can just  >/tmp/tmp.000



#get the TEXT or the to TEXT formated HTML part
:0 fw
|/usr/local/bin/perl /etc/admin/perlscript/demime.pl -

#removing the email header
:0 fw
|formail -I ""

        Why remove the header?  You can use unly the body by 'b'
        or 'B' flags in any recipe.



#make a MIME TEXT part
# and let makemime guess the encoding type
:0 fw
|makemime -c "text/plain;" -o /tmp/tmp.001 -

#tell procmail that this copy has arived its destination
:0
/dev/null
}

#getting a new copy  only if the email is multipart
:0 c w
* ^Content-Type:.*multipart
{
#get the MIME parts we like and strip the one we dont
# i left image/gif there to have something to test
# but it could be a comma separated list
# some typs i already hardcoded into stripmime
:0 fw
|/etc/admin/perlscript/stripmime.pl -e text/plain -i image/gif  -m  -h

:0
/tmp/tmp.000
}



   (1)
#erasing the body of the original mail
:0 fbw
|cat - > /dev/null


   (2)
#inserting in the body the first boundary
#and dont add a nl
:0 fbw
|echo -n --$BOUNDARY

        Just for example (1)+(2)=

:0 fbw
| awk '{printf("%s", '$BOUNDARY'); exit}'



#replacing or inserting the Content-Type header
# because we have no multipart/alternativ anymore
#we need a multipart mixed
#regarding RFC1521 a subtype is not mandatory
:0 fhw
|formail -I "Content-Type: multipart/mixed; boundary=\"$BOUNDARY\""

#put it all together an repair missing boundary,
#charactersets and ..... with reformime
:0 fw
|cat - /tmp/tmp.001 /tmp/tmp.000|reformime -r

#deliver it
:0:
/var/mail/test


        I use metamail to unpack all the parts into one directory.  The
        command `file *' will give the type of the parts (not in mime's
        type format) then you and group and rebuile the message with the
        parts that you want to.  If you want to try I'll send you my very
        little script.

Bye,
  Udi

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail